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CHAPTER 1 


INTRODUCTION 


Probabilistic and deterministic description of natural phenomena have 
coexisted for ages. Very often, probabilistic descriptions are assumed derivable 
from the underlying determinism although this is rarely carried out This is 
because probabilistic descriptions are invoked when either the deterministic 
model equations become hard to solve exactly due to many degrees of freedom or 
the probabilistic models themselves are adequate for the problem at hand. Just as 
probabilistic descriptions are contrived conveniences, so also are deterministic 
descriptions in the form of differential equations, because scientific theories 
are not discoveries of the laws of nature but rather, inventions of the human 
mind. The only difference between the two descriptions then appears to be that 
while deterministic descriptions examine the results of a single trial and predict 
the future in certain terms, probabilistic descriptions are presented as 
statements of average behaviour and the future is predicted with a quantitative 
element of uncertainty. 

However, there has been a distortion of facts due to blind faith or 
excessive use of probabilistic descriptions. Many people speak of random 
processes as though they are a fundamental source of randomness which is 
misleading . The only truly fundamental source of randomness known is the 
uncertainty principle. However, as a matter of course, events like roulette wheel 
spins, dice throws and coin tosses are presumed to be completely random. It was 
pointed out early in this century by Poincare that many of the so called classic 
examples of randomness in fact are quite deterministic and involve only a few 


degrees of freedom. 
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At this juncture, a brief discussion on randomness is in order 
Conventionally, randomness may be deduced from impulse-at-zero-delay 
autocorrelation or a flat spectrum. However, many time series which pass these 
conventional tests of randomness are now known to be deterministic in as much as 
they can be realized by simple differential or difference equations. Hence, a 
more fundamental notion of randomness relevant to the development of the 
viewpoint followed here is necessary This is based on algorithmic complexity 
theory developed largely by Kolmogorov and Chaitin [23,24] They define a measure 
of how random a particular number or time series is to be the length of the 
algorithm required to specify that number or time series If a number or time 
series is completely random, the only way to specify it is to write it down. Thus, 
the algorithm will be as long as the number or time series On the other hand, a 
more ordered time series can be generated by an algorithm much shorter than the 
actual time series. 

The apparent randomness, or more appropriately, the unpredictability of 
the outcome of roulette wheel spins, dice throws and coin tosses comes from 
sensitik'e dependence on initial conditions i.e , a small perturbation causes a much 
larger effect at a later time. When sensitive dependence comes in a sustained 
way, i.e., there is no settling down as in a coin toss, it is called chaos ( precise 
definitions will be given in Chapter 2 >. In this sense, roulette wheel spins etc. 
have sensitive dependence on initial conditions, but are not chaotic because they 
come to rest. Chaos is, thus, a special case of sensitive dependence on initial 
conditions, wh«re there is also sustained motion. Chaos is defined in the context 
of deterministic dynamics, and therefore, in a strict sense, it is not random. Such 
a conflict does not arise in the domain of the above definition of randomness 
although chaotic trajectories may pass the other classic tests of ranctomness. 
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Ultimately then, uncertainty originates from something external to the dynamics 
eg. measurement error or external "noise". But the sensitive dependence 
exaggerates uncertainty so that small uncertainties in initial conditions turn 
into large ones. Since chaos amplifies noise exponentially, any uncertainty is 
amplified to macroscopic proportions in finite time, and short term determinism 
becomes long term unpredictability 

This thesis is concerned with some aspects of time series modeling and the 
role that chaotic models can play in this venture Let us first briefly review 

some of the popular time senes modeling schemes. 

1.1 Review of some Signal Modeling Schemes 

A signal or time series is a manifestation of some phenomenon under 
investigation Generally, signals are continuous-time in nature but the process of 
observation at intervals makes them discrete-time. Modeling of signals is done 
for various reasons, primary among them being, to predict the future behaviour of 
the phenomenon, to filter out the unwanted component or to smooth the signal to 
reduce external noise or measurement error that is not a part of the signal. 

Signal modeling schemes can be broadly divided into two categories, namely 
suto-regressii^e modeling and state-space modeling . While the theory of auto- 
regressive modeling is rooted in probabilistic considerations ie. the signal is 
considered to be a sample function or a realization of a random process, the 
theory of state-space modeling has developed from both deterministic and 
probabilistic viewpoints. Let us now discuss these two modeling schemes. More 
emphasis will be given to the deterministic state-space modeling approach as it 
naturally extends to the case of signal modeling jgchemes that will be pursued in 


the thesis. 
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1 . AutO-regreSSIVe modeling consider a stationary random process C yg, y^, 
An auto-regressiue (AR) process is defined as one having the form 

yn + Wn-i * ®P- yn-p = 

where { en ^ is a white noise sequence such that 
E Cen- ep = 0, n 9 i 1 
E Cen- yj] = 0, 1 < n 

and p IS the order of auto-regression . 


(i.i.2) 


Now, given a time series { yg, y^^, . . }, the assumption that it is a 
realization of a stationary random process allows one to construct an auto- 


regressive model, 


yn = S ^k- yn-k + 

ksi 


(i.1.3) 


The order of auto-regression and the coefficients a|^ are chosen so that the 
residuals fn approximate a white noise sequence i.e., 

fn = yn - yn- n = 0, i , (i.1.4) 


where, 

and, 

yn-1 ^ 


^0 = yp' 

On = linear least squares error estimate of yn given C yn-p- - 


“ S *k- yn-k • 

k=l 


(1.1.5) 


P 

X Sk - R <1 i - k I) = R ( i > / 
k»l 


To obtain the model coefficients C aj^l then requires the solution of 

l^iiP (1.1.6) 

where R(k) denotes the autocorrelation function of the stationary random process 
Cyg, y^, ... ). 

An estimate of the autocorrelation function can be made from a finite 
length N of actual data using the following : 
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N^-k 

R (k) = ^ w<m) . y(m+k) 
m«0 


(i.i.7) 


Such a scheme is also referred to as linear prediction coding ( LPC ) 
scheme. There are various formulations of the LPC scheme of which we shall 
briefly discuss the autocorrelation and the covariance methods 


In the autocorrelation method, modeling is done using blocks of data such 
that values outside the block are assumed to be zero Thus, given one such block 


C Wqj ■ ■ > yN-i linear least squares problem is to minimize 

N+p+i 


E = 'Z 

n=-oo 


Z fn 

n=0 


(1.1 8) 


where 6n is given by equation (1.14) and is zero outside the interval C 0, N+p-1). 
It leads to the requirement of solving equation (1.1.6) using equation (1.1 7) In 
this approach, the residuals 6n do not approach a white noise approximation 
because of the arbitrary assumption that the signal is zero outside the finite 
blocklength. 


An alternative to the above approach is the cot'ariance method in which the 
assumption is that the residual length over which the total error is minimized is 
fixed i.e., 


N-1 

E = Z fn 

n=0 


(1.1.9) 


This leads to a set of equations slightly different from (1.1.6) and (1.1.7), 
namely. 


X a,^. (P < i-k ) = <P ( i >. 1 i i i P 


k»l 


where 


(l.i.lO) 


N-k-1 

^ ( i-k ) = X • ytm+k-l), l<i<p, 0<k<p. (i.l.ii) 

m=-k ~ ~ 


Thus, it is seen that a block of data C y_|^, ... , yg, ... , > is required in 

this case. Since the data is not arbitrarily truncated to zero in this case 
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aubside a fixed blocklength, the problem of large residuals near the ends of the 
tDlock is overcotne. However, eouations (1.1.10) - (1.1.11) do not have the Toeplitz 
structure as in equations (1.1.6) - (117) Hence obtaining the parameters Caj^) in 
We case of covariance method requires more computation than in the case of 
saubocorrelation method 


A moving-average scheme for a stationary random process { yg, yj^, ... 3 is 
tor>e where 

yn = €n - x b^- €n-j ' ^^-^-^2) 

3*1 

that IS yn can be^ expressed in terms of the residuals fn as given by 

equation (1.1.4) 


Finally, it was shown by Wold in 1954 that any discrete, stationary time 
series ( yn 3 can be broken into • 


Sk. ^ ^ 2'7rno 

yn = 2. Sk- yn-k - .S bj. + Z S ^rq- cos ( -rp -»■ Crq ) 


= I; 

k=i 


3=1 


r=l q=l 


(1.1.13) 


For practical purposes, the last term in equation (1.1.13) is included in the 
error term. This then gives the autoregressive - moiling average (ARMA) modeling 
scheme which can mathematically be expressed as follows : 

yn = Z y^.,, - bj. 6^,-3 + fn • (1.1.14) 

k*l 3=1 

The AR, MA and ARHA modeling schemes have been studied in great detail. 
Also, variations of these schemes were studied to include other features like 
seasonality, trend, cyclicality of data etc. The algorithms were extended to cope 
with non-stationary processes also. Finally, the presence of efficient algorithms 
to implement them has made these modeling schemes very popular in diverse 


fields. 
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2 State - Space Modeling - review which follows the development given 

in C70]j shall be restricted to deterministic models only i e , the time series will 
be considered to be the output of a deterministic process Stochastic state-space 
modeling schemes have not been discussed in this review because they are not 
directly related to the thesis work This means that some of the very popular 
schemes eg Kalman filtering concepts etc have not been discussed 


Generally, any natural phenomena can be modeled as a dynamical system 
which can then be described by a system of differential equations Three 
successively more general types of model equations will be considered Let X(t> be 


a vector of functions i e , 

X <t) = t XQ(t) X£<t) xm(t) ] ^ <11 15) 

and define its time derivative as 

X <t) = [ XQ(t) xj^<t) xm<t) 3 ^ <1 1 16) 

Then the three classes of differential equations are 

X <t) = AX (t) (1 1 17) 

X (t) = A (t) X (t) (1 1 18) 

X (t) = F CX (t) t3 <1 1 19) 


The vector X(t) is called the state vector of the model and its components 
are the state variables In equation (1 1 17), X(t) is related to X(t) by a linear 
transformation defined by the constant matrix A Such a system is accordingly 
called a constant coefficient linear differential eauation In equation (1 1 18), X(t) 
IS a linear transformation on X(t> with the transformation matrix A(t) having time 
varying components Such a system is called a time varying linear differential 
eauation In equation (1 1 19), X(t) is related to X(t) by a vector of functions of the 


form 
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Xgd) = fg t XQ(t), Xj^(t); Xfn<t), t 3 

Xj^(l) = C Xgd); Xj^(t), Xm<t), t ] 

Xm<t) = -Pm C XQ(t), x^(t), Xm(t), t 3 (i 1 20) 

where fQ, f , fm ar'e possibly non-linear functions of XCt) and possibly t as 
well It IS; thus; the general form of a non-linear differential equation 

The observation of the process provides us with v^ectors of obseri'ations 
which ideally are linear or non-linear transformations of some or all of the 
state-variables In practice, the measuring instruments introduce errors which 
are assumed to be additive 


Thus, letting Xn be the vector of model state-variables at t=tn and letting 
Yn be the vector of observations obtained at that time, the three successively 
more general observation schemes are 



Yn = M Xn + Nn . 






(1 1 21) 


Yn = Mn Xn + 

Nn. 






(1 1 22) 


Yn = G C Xn, tn 3 + Nn 





(1 123) 

Me 

consider the following 

examples of the above ideas Let 

the state- 

variables of the model be the position and velocity , : 

in Cartesian coordinates, of 

a body in straight line motion 

Then, its differential equation will be 



^□(t) 

0 

0 

0 

1 

1 

0 0 ; 

rxo(t) ' 



1) 

; 0 

0 

0 

0 

1 0 

X|^( 1 ) 



dt'='2<*-^ = 

i 0 

0 

0 

0 

0 1 

Xgt t > 

(1 1 24) 


;xo(t) 1 

1 0 

0 

0 

0 

0 0 

X 

o 



'Xj^(t) 

0 

0 

0 

0 

0 0 

. Xi(t) 



:x2(t)^ 

0 

0 

0 

0 

0 0 

t X2(t)J 


which IS 

of the form of 

equation 

(1 i 17) 

Also, 

if Yn IS the 

vector of 
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observations made directly on the position coordinates of the body at time 

instant tn> then the observation relation would be 

"yo' "i 0 0 0 0 0 ' " xon" 'V 

y^ = OiOOOO Xiri + (i 1 25) 

y2 n 0 0 1 0 0 0 *2n ^2 

5«0n 

n 

.’'2n„ 

where Yn = fyp y^ y 2 ^n'*^ CPq t)j|^ measurement error at the 

observation instant tn Equation (1 1 25) is seen to be of the form of equation 
(1 i 21) 


Alternatively, if Yn denotes the observations obtained by a radar, located 
at the Cartesian origin of the range, azimuth and elevation of the body, then the 
observation relation would be 


2 2 2 1/2 

yOn = < ^'On + ^In + *2n ^ -»■ ^On 


Win = tan”^ ( ) 

»in X2n 


U 


In 


y2n 


= tan 


’'2n 


(V 2 . „ 2. 1/2 

(XQn + > 


+ V 


2n 


(1 1 26) 
(1 1 27) 
Cl 1 28) 


which IS of the form of equation (1 1 23) 


In any modeling process, errors arise from two sources, namely, errors in 
the representative model itself and errors in the observations The former are 
known as sysiematic errors and the latter are assumed to be random 

A first step in modeling in this framework is to choose a functional 
representation that adequately describes the process under observation The 
efficacy of polynomials as functional representations is based on their ability to 
approximate any continuous function over a finite interval to any degree of 
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precision 

Assume that the process under observation is being modeled by a 
polynomial x(t) of degree k Denoting the j - order differentiation by D , one can 
choose a vector 

Xn = t xn Dxn o'^Xn 3 (1 1 29) 

Thus, for any n, Xn of equation <i i 29) provides all the information about 
the state of the assumed form of the process It is accordingly referred to as a 
state-i/ector of the chosen model For other forms of functional representation, 
the state-vector should be chosen so that all the information of the assumed 
state of the process is contained in it More than one choice of state-vectors is 
possible For example, for the above one can also choose delayed samples 

Xn = C Xn Xr,_j^ x^.,^ 1 (1 i 30) 

One can show that the time evolution of the state-vector Xn can be related 
to previous values by 

Xn+m = ^m Xn (i 1 31) 

where is known as the transition matrix whose specific form depends on the 
functional representation chosen 

In dynafiucal systems terminology, is called the flow of the system in a 
phase space of dimension k 

In equation (1 i 31), provides a linear transformation However, in the 
case of a non-linear differential equation given by <1 1 19), the flow is also a 
non-linear transformation and therefore, a linear representation of the type 
given by equation (1 1 31) cannot be written The general approach then is to 
linearize the differential equation which is based on the assumption that a 
nominal trajectory X(t) is available sufficiently close to the original one X(t), 
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1 e , the difference vector «X(t) is sufficiently small so that terms involving 
products and squares of the components of that vector can be ignored 

Specifically, then one has the following three successively more general 
situations 

a Constant coefficient linear model - 

Xn+m = <^m Xn <1 i 3i) 

b Time-varying linear model - 

Xn+m = <^m,n Xn (i i 32) 

c Non-linear model - 

Xn = Xn + «Xn 

*Xn+m = ^[jj^n, X ^ 

The process of observation introduces an obseri^able Yn as given by 
equations (i i 2i)-(l i 23) 

The problem of modeling then is to find an approximation ^ to the flow 
from the observables Yp The criteria usually employed to find the approximant 4' 
is to minimize the squared error between the estimated state-vector )Cn the 
actual state-vector Xp, i ^ n _( N, i e , minimize 

Ew = Z I Xn - Xn I ^ <1 1 34) 

n=l 

where 

Xn+i = ? ( Xn ) <1 1 35) 

and Xn is reconstructed from the observables Yn 

State-space modeling schemes are also very popular in diverse fields eg 
in controls and communications area in electrical engineering Deterministic 
state-space modeling schemes can also be used to get an idea of the physical 
laws governing the phenomenon being modeled 
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The viewpoint that complexity may often arise out of low dimensional chaos 
leads to new approaches in signal modeling which have been pursued in the thesis 

The following section discusses some of the aspects informally 

1 .2 Signal Modeling in the Paradigm of Chaoe 

As seen in Section i i, most of the modeling schemes are linear Even when 
the functional representation is non-linear, the attempt is to linearize using 
nominal trajectories etc This naturally breeds the misconception that complex 
behaviour can arise only out of complex systems, which is true when the 
additional constraint of linearity is employed Until recently, the only tool to 
analyse complex behaviour (eg a time senes having a nearly flat spectrum etc ) 
was based on Kolmogorov’s theory of random processes However, now it is known 
that simple non-linear systems are capable of extremely complex behaviour A new 
tool to analyse complex behaviour deterministically is chaos While most of the 
earlier studies in chaos were directed towards finding regions of chaotic 
behaviour in specific non-linear systems, some of the recent studies have 
focussed on the ability to model and predict complex behaviour using the 
theoretical background developed for analysing chaotic behaviour 

Dissipative dynamical systems often have the property that undisturbed 
trajectories ( or the evolution of the flow in state-space) approach a subset of 
the state-space called an attractor This causes a drastic reduction in the 
number of degrees of freedom Fluid flows, for example, have an effectively 
infinite dimensional state-space but can have low dimensional attractors C 28 ] 

Thus, it IS not important to distinguish chaos from randomness but rather 
systems with low dimensional attractors from those with high dimensional 
attractors If a time series is produced by motion on a very high dimensional 
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attractor, then from a practical point of view it is impossible to gather enough 
information to exploit the underlying determinism Dn the other hand, if the 
attractor dimension is low, then it means that the motion has but a few degrees of 
freedom and therefore, can be modeled using few independent variables 

The important first step in modeling in this framework then is to find the 
dimension of the attractor In most situations even complex systems have low 
dimensional attractors, but if the attractors are high dimensional then the 
probabilistic approach may be as good as any - may be even better and linear 
models may be optimal The next step is to reconstruct an appropriate state- 
vector from the observations A powerful tool to aid this was proposed by 
Packard et al, C59] and put on firm mathematical foundation by Takens C6Q3 
Informally put, it states that it is necessary to only observe one variable of the 
system evolving with time The state-vector may be reconstructed from this 
variable and the reconstructed state-space will have the same invariant 
properties eg attractor dimension etc as the state-space of the actual system 
from which the single variable was observed Thereafter, one may choose any non- 
linear functional representation (at the present juncture, the choice is adhoc in 
the absence of greater understanding ) for estimating the flow in reconstructed 
space The choice of a non-linear function is important to accomodate the ability 
to model apparently random behaviour as well which in this viewpoint is due to 
chaos rather than intractable complexity Then simple non-linear techniques may 

be employed to get the parameters of the model 

1.3 Scope and Organization of the Thesis 

The above ideas of signal modeling using deterministic chaos have been 
applied to the specific case of speech signals The vocal tract is the underlying 
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dynamical system in this case We analysed speech signals in terms of phoneme 
utterances because all meaningful sounds can be constructed from a combination 
of phonemes Dimensional analysis showed that phonemes are indeed generated by 
low dimensional attractors Also, the positive values of the second-order entropy 
showed that most phoneme time series are chaotic in nature Thereafter, we 
considered two modeling techniques, namely, the global approximation technique 
and the compromised ov^erlapping neighbourhood (CON) local approximation 
technique using polynomials as functional representations and the above ideas We 
compared the prediction properties of these two techniques with the LPC 
(covariance-method) which is extensively used in speech processing The global 
approximation and the CON - local approximation techniques have successively 
better prediction properties than the LPC 

The thesis has been organized as follows 

Chapter 2 begins with a review of dynamical systems and their steady-state 
behaviour including chaos There are various descriptions of the above topics, 
but a topological description is adopted because it is felt that it provides the 
best insight to the signal modeling problem The appropriate definitions of the 
topological terms are introduced as and when they occur This chapter also 
discusses the various dimensions and entropies that are used to characterize 
attractors and 'strange' or chaotic behaviour Finally, Takers' theorems are 
presented which are central to the technique of reconstructing the state-space 
from a single observed variable of the phenomenon 

Chapter 3 begins with a description of the vocal tract as a dynamical 
system followed by a brief discussion on the classification of phonemes Next, 
some practical aspects for obtaining the correlation dimension are discussed 
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■followed by their application to phoneme time senes Then, the phoneme time 
senes are analysed for their second-order entropy The small values of 
correlation dimension show that the underlying attractors are low-dimensional 
while the positive values of second-order entropy show the chaotic nature of 
phonemes 

Chapter 4 begins by building up a theoretical framework of the signal 
modeling problem Various practical aspects are then considered Some choices of 
functional representations and two approximation techniques namely the global 
and local approximation techniques are presented Since the local approximation 
technique is not practical because of a prohibitively large model order, a 
compromised overlapping neighbourhood local approximation technique is proposed 
It has lower model order than the usual local approximation technique and yet 
attempts to retain the better prediction properties of the local approximation 
technique over the global approximation technique Finally, a comparison of the 
computational complexity of various schemes is also made 

Chapter 5 concludes the thesis with an overview of the work done and 


suggestions for future work 



CHAPTER 2 


REVIEW OF THEORETICAL ASPECTS 


This chapter begins with a definition of dynamical systems and their 
classification in terms of steady state solutions and limit sets It discusses 
attractors in terms of dimensions and entropies and presents the embedding 
theorems which are used for reconstructing the state-space from a single 
observed variable This chapter thus forms the basis of the theoretical 

framework for the signal modeling problem We begin with dynamical systems 

2.1 Dynamical Systems 

Dynamical systems are often described in terms of the real space In such 
a case, a dynamical system with a continuous time evolution can be specified by a 
system of differential equations Three successively more general classes of 
differential equations were given by equations <i 1 17)-<1 i 19) and discussed 
therein Dynamical systems may be grouped into autonomous and non-autonomous 
systems 

A k^^- order autonomous dynamical system is defined by the state equation 

X = f ( X ), X < tg ) = Xq (2 11) 

k k 

where X € R is the state at time t and is a point in the phase space f R — f 

R IS called the vector field which associates a tangent vector with each point 

in phase space The solution to equation (2 i 1) with initial condition Xq at time tg 

IS called the trajectory which lies in the phase space and is denoted by 

k k 

The mapping R — f R is called the flow of the system The dynamical system 
given by equation (2 1 1) is linear if f(X) is linear 
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A order non-autonomous dynamical system is defined by the time- 

varying state equation 

X = f ( X. t >. X ( tQ ) = Xq (2 12) 

The vector-field f depends on time The solution to equation (2 1 2) passing 
through Xq at time tQ is Xq, Iq) The system is linear if f is linear with 
respect to X 

If there exists a T > 0, such that f( X, t ) = f ( X, t+T ) for all X and all t, 
the system is said to be time-periodic with period T The smallest such T is 
called the minimal period An n^^- order time periodic non-autonomous system can 
always be converted to an (n+1)^^- order autonomous system by appending an extra 
state 6 = Zirl/T The autonomous system will then be given by 

X = f ( X, eT/27r ), X ( 0 ) = Xq 

e = 2it / T e (0) = 2it tg/ T (2 i 3) 

Discrete-time dynamical systems can also be described along similar lines 

1^ Ic 

Any map f R -4 R defines a discrete-time dynamical system by the state 
equation 

^k+1 = •f < ) , k = 0. 1, (2 14) 

where X|^ € R is called the state and f maps the state Xj,, to the next state 

Most of the development pertaining to signal modeling is rooted in 
topological considerations Consequently, the presentation of dynamical systems 
in terms of arbitrary manifolds is desirable A description of some of the terms 
used in the presentation is now given, although a minimum knowledge is also 
assumed 

A function is said to be class k or C if its first k partial derivatives 
exist and are continuous If, for a function f X — f Y, there is only one x such 
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that y = f (x), then the map is said to be one-to-one or injectix^e If the range of f 
IS the entire space Y, then the map is said to be onto or surjectii^e and is written 
as f (X) = Y If f IS both one-to-one and ontOj then it is bijecti^e A 
diffeamorphism is a. C - bisection A homeomorphisw is a bijective map f which is 
bicontinuDus ( i e , f and f'^ are continuous ) 

An m-dimensional surface in R is defined by relations between the 
k 

coordinates of R However, this notion can be generalized to a manifold which 

u 

does not need to be placed in R A manifold is a metric space for which every 
point has a neighbourhood homeomorphic to R This homeomorphism provides 
coordinates for the manifold by identifying elements of the manifold with the 
corresponding elements in R The topological dimension of a manifold M is that 
of the corresponding real space A submanifold is a manifold contained in another 
one 

An open cover of a set S is a collection of open subsets of S such that S 
IS contained in their union A subcoi^er is a subset of a cover that is itself a 
cover A finite coiner is a cover with a finite number of elements The set S is 
said to be compact if every open cover of S has a finite subcover eg bounded, 
closed subsets of R are compact 

In the same way that the derivative of a function provides its best linear 
approximation, the best linear approximation to a manifold M at a point x is given 
by its tangent space, denoted by Tx (M) For example, consider a 2-d manifold M 

3 

placed in R In this case, the tangent space is just a plane Let C be a curve in 

3 3 

M(CR— +R,C l-*x) The tangent vector to C at x is a vector in R given by 

Vx^ = dt (2 15) 

The set of tangent vectors to all curves passing through a point is the tangent 
space to the manifold at that point 
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Following is a description oF dynamical systems in terms oF arbitrary 
maniFolds 

Let the phase space oF a system be a compact maniFold M A dynamical 
system on M is a map 4> M — ^ M (discrete-time) or a vector-Field (continuous-time) 
The vector-Field associates each point in the maniFold with an element oF the 
tangent space at that point The time evolution oF a dynamical system is given by 
the flow 4>^ (Xq) (discrete-time) or (Xq) (continuous-time) In experimental 

situations, the process oF sampling the continuous-time Flow introduces the 
discrete-time map i e , , t being a Fixed delay and i € W An 

obseri/able on the dynamical system is a smooth Funtion y M — ♦ R 

2.2 Steady -State Behaviour, Limit Sets and Attractors 

The steady-state behaviour oF a dynamical system reFers to its asymptotic 
behaviour Only bounded steady-state responses are oF interest to us Some 
deFinitions are given beFore a discussion oF the steady-state behaviour oF 
dynamical systems 

An iru/ari&nt set S oF a Flow ^ on a maniFold M is a subset oF M deFined by 
S = { X € M / (X) € S V X 6 S and V t } (2 2 1) 

A point y IS a limit point oF X iF For every neighbourhood U oF y, (X) 
repeatedly enters U as t — > oo 

The positii^e limit sets ( or limit cycle ) (X) oF a point X is the set oF 

limit points that the Flow approaches with an initial condition oF X 

<X) = { Y 6 M / 3 tj -4 oo with (X) -4 Y ) (2 2 2) 

Negative limit sets are deFined For Flows going backward in time Limit sets 


are closed and invariant under 
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A limit set L is an attracting set if it has a neighbourhood U such that 

€ U V t and -4 S (i e , the flow comes arbitrarily close to all 

members of S) as t -4 

The basin of attraction B (L) of an attracting limit set L is defined as the 
union of all such neighbourhoods U Every trajectory in B (L) tends towards L as t 
— ^ oo 

Limit sets are useful in describing the classical types of steady-state 
behaviour eg equilibrium points, limit cycles, quasi-periodic attractor etc but 
the definition cannot be extended to the complex steady-state behaviour found in 
chaotic systems The term strange attractor is used to describe the object on 
which the trajectories of a chaotic system accumulate Four different types of 
steady-state behaviour follow 

1 EC|UlllbriUrn points * equilibrium point Xgq of an autonomous system given by 
equation (2 1 i) is a constant solution 

( Xeq > = Xeq , for all t (2 2 3) 

At the equilibrium point the vector field or map vanishes i e , f (Xeq) = 0 
An example is the damped pendulum equation 
X = Y 

Y = - k Y - sin X (2 2 4) 

which has an infinity of equilibrium points at (X,Y) = ( kUjO ), k= 0, ±1, +2, 
which constitutes the invariant set The limit set for an equilibrium point is the 
equilibrium point itself 

2 Perio(Jic Solutions ^t ® periodic solution if 

( X*, to ) = ( X*, to ) (2 2 5) 

for all t and some minimal T' > 0 A periodic solution has a Fourier transform 
consisting of a fundamental component at f = i/T' and evenly spaced harmonics at 
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k/T, k = 2,3, The amDlitude of some of these spectral components may be zero 
Figure 2 1 shows the fundamental and subharmonic solutions together with their 
spectrum for the Duffing's equation 
X = Y 

Y = X - - frY + r cos ut (2 2 6) 


In the autonomous case, an isolated periodic solution! (X*) is called a 
limit cycle A limit cycle is a self-sustained oscillation and cannot occur in a 
linear system An example of a limit cycle is found in the van der Pol equation 
X = Y 

Y = ( 1 - ) Y - X (2 2 7) 

Figure 2 2 shows the trajectory, time waveform and the spectrum of limit cycle 
behaviour 

The limit set corresponding to a limit cycle is the closed curve traced out 

% 

by ^^^ ( X ) over one period 

3 Quasi-Periodic Solutions - ® solution can be described by a set of k 

incommensurate frequencies w J- e , 

X (t) = f ( w^t, ' “k*^ ^ (2 2 8) 

\c 

The motion equation (2 2 8) describes is a k-dimensional torus T (i e „ a 
product of k circles) embedded in R and constitutes a quasi-periodic attractor 
Quasi-periodic waveforms may be created when two or more periodic functions 
interact non-linearly 

Quasi-periodic solutions may be described with the example of the van der 
Pol equation (2 2 7) It possesses a limit cycle whose natural period Tj^ depends on 
the system parameters Now add a sinusoidal term as follows 

' A periodic solution is isolated if there exists a neighbourhood of it that contains 
nu * ’'ler periodic solution 







Figure 2.1 - Periodic solutions of Duffing's equation for t= 0 3, 1. a) Period 1 

solution . b) Time waveform of a), c) Spectrum from b), d) Period 3 subharmonic 

e) ime waveform of d), f) Spectrum from e). Only odd harmonics are present in 
both c) and f). (Reproduced from 120]) 



(a) 

Figure 2 2 - Limit cycle for the van 
waveform, c) Spectrum of b) - 
(Reproduced from [20]) 


(c) 

der Pol equation a) Trajectory, b) The time 
due to symmetry of the time-waveform 
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X = Y 

Y = (l-X^)Y - X + A cos < 2TTt/r2> <2 2 9) 

The solution to the forced system could synchronize with some multiple of 
the input period !£ resulting in a subharmonic Alternatively, in a conflict 
between and T 2 , neither period wins, and quasi-periodic behaviour results A 
two-periodic trajectory of the forced van der Pol equation (2 2 9) along with its 
time waveform and spectrum is shown in figure 2 3 

4 Chaos and Strange Attractors - generally accepted definition of 

chaos From a practical point of view chaos can be described as bounded steady- 
state behaviour that is not an equilibrium point, not periodic and not quasi- 
periodic In the following, an attempt to describe chaotic behaviour is made and 
then a definition based on topological considerations is given 

An example of a chaotic trajectory along with its time waveform and 
spectrum is shown in figure 2 4 It is evident from the figure that the trajectory 
IS bounded, but not periodic and does not have the uniform distribution 
characteristic of quasi-periodic solutions The limit set for chaotic behaviour is 
not a set of simple geometrical objects like a circle or torus, but is related to 
fractals and Cantor sets C193 Another property of chaotic systems is sensitive 
dependence on initial conditions 1 e , given two different initial conditions 
arbitrarily close to one another, the trajectories emanating from these points 
diverge at a rate characteristic of the system until for all practical purposes, 
they become unoorrelated (see figure 2 5) In practice, the initial state of the 
system can never be specified exactly, but only to within some tolerance f > 0 If 
two initial conditions Xq and Xq lie within f of one another, they cannot be 
distinguished However, after a finite amount of time, (Xq) and (Xg) will 
diverge and become uncorrelated Therefore, no matter how precisely the initial 
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t 

(b) 



Figure 2 3 - Quasi-periodic behaviour of the non-autonomous van der Pol 
equation for A=0 5 and T2=2^/l 1. a) The trajectory, b) The time waveform, c) 
Spectrum of b). (Reproduced from [20]) 
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Figure 2A - a) Chaotic trajectory of a second-order non-autonomous system, b) 
Time waveform, c) Spectrum of b) (Reproduced from [20]) 



Figure 2 6 - Two trajectories plotted versus time showing sensitive dependence 
on initial conditions in a third order autonomous system The initial conditions 
differ by 0.01 %. (Reproduced from [20]) 
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condition is known, the long term behaviour of chaotic systems cannot be 
predicted That is what is meant by the apparently random behaviour of such 
deterministic systems Many such systems are exploited to generate the so called 
pseudo-random numbers 

Again, some terms are presented before giving a definition of chaos based 
on the topological approach [161 The discussion is terms of a discrete-time 
system, although it easily extends to the continuous-time case by analogy 

For any set S, its closure S consists of all points in S together with all 
limit points of S (For definition of limit point, see earlier part of this section) 

A subset U of S IS dense in S if 0 = S 

A function f J — > J is said to be topologically transitn^e if for any pair 

If 

of open sets U,V C J, 9 k > 0 such that f (U) H V 0 A map possesing a dense 
orbit ( equivalent to a trajectory in a flow ) is topologically transitive 
Intuitively, a topologically transitive map has points which eventually move under 
iteration from one arbitrarily small neighbourhood to any other 

A function f J — f J has sensitive dependence on initial conditions if 3 s > 
0 such that for any x and for any neighbourhood N of x, 3 y € N and n > 0 such 
that I f'^(x) - f'^(y) I > f 

A map possesses sensitive dependence on initial conditions if there exist 
points arbitrarily close to x which eventually separate from x by at least « under 
iteration f It is important to note that not all points near x need eventually 
separate but at least one such point in every neighbourhood should 

A chaotic map can now be defined as follows 


Let V be a set f V -+ V is said to be chaotic on V if 
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I f has sensitive dependence on initial conditions, 

II f IS topologically transitive, 

III the periodic points are dense in V 

Thus, chaotic maps are indecomposible and they possess an element of 
regularity amidst long term unpredictability The unpredictability is due to 
sensitive dependence on initial conditions The map cannot be decomposed into two 
subsystems (i e , two invariant open sets ) which do not interact under f because 
of topological transitivity The element of regularity is because of periodic 
points which are dense 

An operational definition of a strange attractor can now be given A 
strange attractor is an attracting set possessing properties of invariance, 
topological transitivity and sensitive dependence on initial conditions It is the 
sensitive dependence on initial conditions that makes the attractor strange 

Lyapunov exponents are a convenient way of categorizing steady-state 
behaviourC49,51,543 Given a continuous dynamical system in a k-dimensional phase 
space, the long term evolution of an infintesimal k-sphere of initial conditions is 
monitored The sphere becomes a k-ellipsoid due to the locally deforming nature 
of the flow The i^*^ 1-dimensional Lyapunov exponent is defined in terms of the 

length of the ellipsoidal principal axis Pj^(t) 

X. = lim f logo (2 2 10) 

^ t-oo t Z p^(0) 

Thus, Lyapunov exponents are related to the expanding or contracting 

nature of different directions in phase space For an attractor, contraction must 

outweigh expansion, so that 
n 

Z h < 0 

1=0 
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The Lyapunov exponents are generally written in decreasing order i e , 

X2 i X3 

For a stable equilibrium point, x^ < 0 for all 1 For a stable limit cycle, Xj|^ 
= 0 and x^ < 0 for 1 = 2,3, , n For a stable quasi-periodic attractor with k 

rationally independent frequencies, x^ = 0, > \ = ’"k+l ^ ^ strange 

attractor has at least one positive Lyapunov exponent Extraction of Lyapunov 
exponents from experimental time series is difficult although partial attempts in 

this direction have been made [50,53,55] 

2.3 Attractor Dimensions and Entropies 

In this section, we study the classification of attractors and trajectories 
using the concepts of dimensions and entropies 1 18,20,311 

An attractor is said to be k-dimensional if in a neighbourhood of every 
point, it IS diffeomorphic to an open subset of R For example, a limit cycle is 
one-dimensional since it looks locally like an interval A torus is two-dimensional 
since it locally resembles an open subset of R An equilibrium point has zero 
dimension The neighbourhood of any point of a strange attractor has a fine 
structure which does not resemble any Euclidean space They have non-integer 
dimension 

There are various generalized dimensions to cope with non-integral 
situations eg capacity and Hausdorff dimension, information dimension, 
correlation dimension etc Instead of presenting them individually, a unified 
procedure is adopted C38,453 

The discrete-probability is given by p^ = N^/N, where N is the total number 
of elements in the considered sample space Given some partition $ consisting of 
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elements Cf in the sample space, one can count the number of times 

an element is found in The order information Hq is defined as 

Hq = ^ log 5: p.'^ (2 3 1) 

1-q 1 ^ 

Now consider the basin of attraction as a sample space and take a finite 
partition § (r) with diameter r of this basin in which the trajectory X(t) is 
situated Then, 

Hq (r) = inf Hq ( $(r) ) (2 3 2) 

denotes a order information which depends on the partition diameter r Hq(r) 

IS given by the infimum of the different informations resulting from all possible 
partitions $ (r) 


One can define a quantity Dq of order q given by 

log Hq (r) 
r-O log r 

which IS the generalized order dimension of the attractor 


(2 3 3) 


The fractal dimension or capacity of the attractor is Dn = lim Dq and 

^ r-*0 


given by 


log M (r) 

Do = - lim — 

P-*0 log r 


(2 3 4) 


It IS determined by M (r), the minimal number of cubes with edge length r 
needed to cover the attractor Equation (2 3 4) is equivalent to Mandelbrot's 
definition of the fractal dimension tl93 which originates from Hausdorff For 
manifolds, Dq is equal to the dimension of the manifold which is an integer 
However, for objects with fractal structure, Dq is usually a non-integer The 


following two examples illustrate this 
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Example 1 - The unit interval Co^er the umt interval [0,i] with volume elements 

(intervals) of length r ® 1/3^ ( Figure 2 6(a) ) Then, M(r) « 3*^ To refine the 


covering, let n — f to arrive at 


Do * - lim 3 

u n-'oo 


,n 


= 1 


(2 3 5) 


In 1/3' 


Thus, the unit interval has dimension 1 as expected 

Example 2 - The middle-third Cantor set the middle third of the umt 

interval leaving the two intervals [ 0,1/3 1 and [ 2/3,1 3 Remove the middle third 
of each of these intervals leaving four intervals (see figure 2 6(b) ) Repeat this 
process ad infinitum The resulting set is called the middle-third Cantor set They 
have interesting properties like fractional dimension and a fine structure similar 


MW r MW 

> 1 1 j . 

■3 •''3 Z , 


■ 3 V3 4 

2T '/zi B ~ 


Figure 2.6 - Two examples of the fractal dimension a) The unit-interval, b) The 
middle-third Cantor set. 
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to that observed in chaotic systems To show that Dq of this Cantor set is not an 
integer, choose a covering of intervals with length r = 1/3^ Then, M (r) = 2^ and, 

Dn = lim = 0 6309 (2 3 6) 

° In 3" IrTl 

Hence, the Cantor set is something more than a point (dimension = 0), but 
something less than an interval (dimension = 1) 


The information dimension [323 of the attractor is Di = lim Dn, and given 

^ q-*i 

by 

D. = - lim 3^ } (2 3 7) 

r-*0 log r 

where 

M(r) 

S (r) = - ^ D log D, (2 3 8) 

1=1 

IS the entropy i e , the amount of information needed to specify the state of the 


system to an accuracy r if the state is known to be on the attractor Hence, the 


name information dimension While Dq does not utilize any information about the 
time behaviour of dynamical systems, Dj^ takes into account the relative frequency 
of visitation of a trajectory in a cube of edge length r 


The correlation dimension (353 of the attractor is Do = lim Dn, and given 

q-2 


log C(r) 

Do = lim 

^ r-*0 lo9 r 


(2 3 9) 


where 


C (r) = lim J) Q ir - \ X, -X.\) 

N-oo 1,^1 ^ ' 


(2 3 10) 


6 being the Heaviside function 6 (x) = i for x > 0 and © (x) = 0 otherwise It has 


been shown that D 2 ( D^^ < Dq The conditions for equality of the dimensions are 
satisfied if the points are distributed uniformly over the attractor D 2 is the 
easiest to compute from an observed time series[353 and is generally the one that 1 
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IS computed to get an idea of the attractor dimension 


We now consider the definition of order entropy of a trajectory X (t) 

situated in the basin of the attractor The considered points X (t) along the 
trajectory are separated by a constant time increment t The whole phase space 
IS partitioned into cubes of edge length r p (i^, i2> is the joint probability 

that X (t=T) IS in cube i^, X (t=2T) is in cube i 2 > ^ <t=dT) is in cube i^i The 


-th 


order entropy is defined as 

Kq = - lim lim — - — log T) (il/ 4d> 

r -0 d-oo dT(a-l) 1, ^ ° 

1^, ,id 


(2 3 li) 


Kq is the topological entropy 

The first-order entropy, Ki = lim Kq is the metric or Kolmogoroi^ entropy 

^ q-i 

which IS a measure for the internal information production during its temporal 
evolution 


K 1 = lim 


lim 


r-‘0 d-'oo d ,T 


p (4, ,1^) log p ,1^) (2 3 12 ) 

4' '^d 

According to the Pesin identity, K^ is approximately equal to the sum of 
the positive Lyapunov exponents of the system 

Similar to the second-order dimension D2i a second-order entropy <2 
be defined by the Cjj|(r), the correlation function C (r) for reconstructed 
dimension d 


K^ = lim lim 


iin. „ 

r-'O o-*<» ^ 

Some practical points regarding the evaluation of correlation dimension 


Cd<^> 


log 




(2 3 13) 


and the second-order entropy will be discussed in Chapter 3 in the context of 
application to phoneme time series 

As in the case of dimensions, it has been shown that *<2 < Kj^ < Kq The 
limiting cases, Kj^ = 0 and Kj^ -f 00 characterize situations of regular (eg 
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periodic) and random behaviour respectively Kj^ > 0 signifies chaotic behaviour 
Since <2 < K 2 > 0 is a sufficient condition for deterministic chaos 

2 .4 Reconstruction of Attractors 

A system's dynamics can be reconstructed from a single degree of freedom 
as suggested by Packard et alC591 and put on firm mathematical foundation by 
TakensCSO] 

Consider a dynamical system as described in terms of arbitrary manifolds 
in Section 2 1 It is briefly restated as follows 

The system to be considered is a manifold M in which the system state can 

be defined Consider a map (p M — > M (discrete-time) or a vector field X on M 

(continuous-time) which defines the dynamics, the flow (Xq) (discrete-time) or 

(Xq) (continuous-time), and an observable y M -f R Takers showed that M can be 
k 

embedded in R through the observable y, the limit set of the flow is reproduced 
and that the attractor dimension is unchanged by the embedding 

In the following the relevant theorems due to Takens are stated without the 

proofs 

Theorem (Takens) - M be a compact mamfold of dimension m For pairs ((p,y), 

(P M M, a smooth diffeomorphism, and y M — > R, a smooth function, it is a 
generic property that the map M — » defined by 

*(V,y) ^ y(V»^(X)), ,y((i)^”'(X)) } (2 4 1) 

IS an embedding, by smooth we mean at least C^ 

For each point X € M, $ (X) gives a point in and this mapping is an 

embedding Ulhile 2m+l is the largest real space needed to fit the manifold, it 
often turns out that the dimension can be less For only a pathological or 
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trivial choice of M, y and V> will the derivatives of the iterates of the map be 
linearly dependent If such is the case, then an arbitrary perturbation in the 
space of possible manifolds, functions or observables will remove the linear 
dependence In practice, the presence of noise guarantees this perturbation 

Takens also showed that instead of time delays, 2m+i derivatives will also 
work It IS an open question whether other choices for the set of functions might 
in some sense be optimal However, time delays are the most commonly used 
functions due to their simplicity 

Generally, sampled versions of signals are observed in practical 
situations The process of sampling introduces a discrete-time map p^from the 
continuous-time flow Takens has shown that the limit sets for the continuous- 
time flow and the discrete-time map derived from the flow are identical for most 
choices of time increments Host means a residual (open and dense) subset of time 
increments 

Theorem (Takens) M be a compact mamfold, X a vector field on M with flow 

(Pt P a point in M Then, there is a residual subset p of positive real 

0 

numbers such that for t € C^^p, the positive limit sets of P for the flow of X 
and for the diffeomorphism (Pr are the same In other words, for t € C^p, we have 
that each point q € M which is the limit of the sequnce (P), t^ € R, t^ — ♦ oo, is 
the limit of a seqence € N, n^ — f oo 

These two theorems pave the way for a corollary that finally gives the 
central result on observing dynamical systems In the above discussion and 
theorems, it was shown that discrete-time maps on the manifolds lead to an 
embedding in through the observable and time delays, and that the 
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conversion of a continuous-time flow into a discrete-time map by choosing a fixed 
sampling interval gives the same limit set These results together imply that the 
discretely-sampled data set embedded into has the same limit set as the 
real system 


Coroll&ry (Taksns) M be a compact manifold of dimension m Ule consider 

quadruples, consisting of a vector field X, a function y, a point P and a positive 
real number t For generic such (X,y,P,T) < more precisely for generic (X,y) and t 
satisfying generic conditions depending on X and P), the positive limit set L"*’(P) is 


diffeomorphic with the set of limit points of the following sequence in F 

_ oo 

®X,y,P,T “ ^ ^k=0 


2m+l 


(2 4 2 ) 


Here diffeomorphic means that there is a smooth embedding of M into 
j^2m+i capping l+(p) bijectively to the set of limit points of this sequence 


The final theorem states that the capacity of the limit set of the flow or 
the real system is the same as the capacity of the embedded flow This follows 
from the previous observation that dimensions are preserved by embeddings 

Theorem (Takens) • < l%) > = d^ c l'^( i (243) 

In this chapter, a review of the preliminaries for the signal modeling was 
given Only those aspects of dynamical systems behaviour and their 
characterization were discussed which will be relevant later on in the thesis 
Thus, for example, an otherwise important topic of routes to chaos was not 
discussed In the next chapter, we will be concerned with the vocal tract as 
dynamical system and will discuss the practical aspects of dimension and entropy 
analysis in the context of speech signals 



CHAPTER 3 


ON SPEECH PRODUCTION AND ATTRACTOR DIMENSION 
AND ENTROPY OF PHONEME TIME SERIES 

This chapter begins with a presentation of the vocal tract as a dynamical 
system^ the mechanism of speech production and its traditional analysis 
Thereafter, the ideas developed in the previous chapter are used to carry out 
dimension and entropy analysis in terms of unit utterances, namely phonemes The 
practical aspects of the analysis scheme are also considered The analysis shows 
that phoneme time series are generated by low dimensional attractors Moreover, 
the positive values of second-order entropy show the chaotic nature of phoneme 
time series These two observations have various implications eg synthesis of 
phonemes by extracting formant frequencies may not be adequate as indeed the 
synthesised sounds indicate Moreover, analysis of speech signals in terms of 
linear systems is inadequate and, in general, long term prediction as well as 
higher level recognition may indeed be impossible Ue are therefore motivated to 
model speech signals using non-linear functional representations which are 
capable of modeling chaotic behaviour as well These modeling schemes and their 

performance comparison with the LPC will then be discussed in the next chapter 

3.1 Vocal Tract as a Dynamical System and Acoustic 
Phonetics 

The speech production mechanism has the vocal tract as the underlying 
dynamical system Also, all meaningful sounds can be expressed in terms of 
phonemes This section briefly discusses the vocal tract and phoneme 
classification according to the vocal tract characteristics during their 
utterance For a good discussion one may consult C79] 
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Figure 3 i shows an X-ray photograph in which the important features of 

the vocal tract have been highlighted by dotted lines The i^ocal tract begins at 

the opening between the i^ocal chords or glottis It consists of the pharynx or the 

connection from the esophagus to the mouth, and the mouth or the oral cak'ity The 

vocal tract ends at the lips The total length of the vocal tract in an average 

male is about 17 cm The cross-sectional area of the vocal tract determined by 

the positions of the tongue, lips, jaw and velum varies from zero (i e , complete 

2 

closure) to about 20 cm The nasal tract, which begins at the velum and ends at 
the nostrils is acoustically coupled to the vocal tract when the velum is lowered 
to produce nasal sounds of speech 

Speech sounds can be classified into 3 classes according to the type of 
excitation ^^olced sounds are produced by forcing air through the glottis with 
the tension of the vocal chords adjusted so that they vibrate in relaxation 
oscillation Examples are /b/ as in buy, /m/ as in me, /y/ as in yellow etc 
Uni'oiced or fricatit/e sounds are generated by forming a constriction at some 
point in the vocal tract, and forcing air through the constriction at some high 
enough velocity to produce turbulence eg /z/ as in zip, /v/ as in i/ine etc 
Ploszu'e sounds result from making a complete closure (usually towards the front 
end of the vocal tract), building up pressure behind the closure, and abruptly 
releasing it eg /tS/ as in chip, /t/ as in tie etc 

The following discussion deals with the classification of phonemes into 
four broad classes namely, vowels, dipthongs, semi-vowels and consonants Also, 
each of these phonemes can be classified as either a continuant or non-continuant 
sound Continuant sounds are produced by a fixed vocal tract configuration 
excited by an appropriate source They include vowels, fricatives (voiced and 
unvoiced) and nasals The remaining sounds (dipthongs, semi-vowels, stops and 
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Figure 3 1 An X-ray of the vocal tract system [Reproduced from 79] 
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affricates) are produced by changing vocal tract configuration and are called 
non-continuants Figure 2-iO, parts a) and b) in each show a part of the time- 
series and the corresponding spectrum of phonemes /o/, /kl/, A- A /m/, /f/, /v/, /a/, 
A/ and A^/ which are representative of the following classes 

1 VOWSiS ■ They are produced by exciting a fixed vocal tract with quasi- 
periodic pulses of air caused by vibration of the vocal chords The dependence 
of the cross-sectional area upon the distance along the tract is called the area 
function of the vocal tract For a particular vowel, the area function depends 
primarily on the position of the tongue, but the positions of the jaw, lips and 
velum also influence the sound Thus, each vowel sound can be characterized by 
the vocal tract configuration (or the area function) that is used in its 
production An alternate approach is to assume the vocal tract as a resonance 
tube and characterize vowels in terms of the first three resonance frequencies 
which are known as formants in speech terminology Examples of vowels are /i/ as 
in bead, /I/ as in bid, /ae/ as in bad etc 

2 DipthongS - ^ dipthong IS described as a gliding monosyllabic speech item that 
starts at or near the articulatory position for one vowel and moves to or 
towards the position of another The three dipthongs of the International 
Phonetic Alphabet (IPA) are /al/ as in ride, /aU/ as in out and /ol/ as in boy Thus, 
the dipthongs are produced by varying the vocal tract between appropriate vowel 
configurations and hence they can be characterized by a time varying area 
function 

3 Ssrni-VoWSiS - similar to dipthongs, these are characterized by a gliding 
transition in the area function between adjacent phonemes and are strongly 
influenced by the context in which they occur Examples are /w/ as in jve, /!/ as 
in lap and /r/ as in rap 

4 Ns S3 Is - nasal consonants /m/ as in me, /n/ as in not and / / as in thins? 
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produced with glottal excitation and the vocal tract constricted at some point 
along the oral passageway The velum is lowered so that air flows through the 
nasal tract and sound is radiated at the nostrils Nasal consonants and nasalized 
vowels (i e , some vowels preceding or following nasal consonants) are 
characterized by resonances which are spectrally broader or more highly damped 
than those for vowels The broadening of the nasal resonances is due to the fact 
that the inner surface of the nasal tract is convoluted so that the nasal cavity 
has a relatively large ratio of surface to cross-sectional area This also means 
that heat conduction and viscous losses are larger than normal 

5 UnVOICSd Fries tlV85 unvoiced fricatives /f/ as in fua /s/ as in sap, /e/ 

as in thought and /S / as in ship are produced by exciting the vocal tract by a 
steady air flow which becomes turbulent in the region of a constriction in the 
vocal tract 

6. VoiCSd Fries tlV6S voiced fricatives /v/ as in i^ine, /z/ as in zip, /«/ as 

in them and / / as in measure are the counterparts of the corresponding unvoiced 
fricatives /f/, /s/, /©/ and / /, in the sense that the place of constriction are 
the same for both However, voiced fricatives are different from the unvoiced 
counterparts in that the former have two excitation sources in their production 
as opposed to one of the latter For voiced fricatives the vocal chords also 
vibrate and thus the glottis acts as an excitation source 

7 VOICSd Stops voiced stop consonants are /b/ as in bug, /d/ as in dog, 

/g/ as in got They are transient, noncontinuant sounds and are produced by 
building up pressure behind a total constriction somewhere in the oral tract and 
suddenly releasing the pressure These sounds are highly influenced by the 
vowels which follow the stop consonant By themselves, the waveforms for stop 
consonants give little information about the particular stop consonant 

8 Unvoiced Stops voiced stop consonants /p/ as in pie, /t/ as tie and /k/ 

as in key are similar to their voiced counterparts /b/, /d/ and /g/ One major 
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difference is that in this case, during the period of total closure of the vocal 
tract, as the pressure builds up, the vocal chords do not vibrate Thus, following 
the period of closure, as the air pressure is released, there is a brief interval 
of friction (due to sudden turbulence of escaping air) followed by a period of 
aspiration (steady airflow from the glottis exciting the resonances of the vocal 
tract) before voiced excitation begins 

9 AffriCfitCS 8,nd /h/ unvoiced affricate /tS / can be modeled as a 

concatenation of the stop /t/ and the fricative /S / The voiced affricate /j/ can 
be modeled as a concatenation of the stop /d/ and the fricative /J/ Finally, the 
phoneme /h/ is produced by exciting the vocal tract by a steady air flow - i e , 
without the vocal chords vibrating, but with the turbulent flow being produced at 
the glottis 

This completes the brief discussion on the classification of phonemes The 
analysis and synthesis of phonemes assume importance in speech processing 

because all meaningful sounds can be broken down into these unit utterances 

3 .2 Some Aspects of the Acoustic Theory of Speech 
Production 

In most applications, sound is associated with vibrations of particles in 
the media and the frequencies of these vibrations Thus, the laws of physics 
form the basis for describing the generation and propogation of sound In 
particular, for describing the vocal system, the fundamental laws of the 
conservation of energy, conservation of momentum along with the laws of 
thermodynamics and fluid mechanics must be applied to the compressible, low 
viscosity fluid (air) that is the medium for sound propogation in speech A 
complete acoustic theory must include the effects of time variation of the vocal 
tract shape, losses due to heat conduction and viscous friction at the vocal tract 
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walls, softness of the vocal tract walls, radiation of sound at the lips, nasal 
coupling and excitation of sound in the vocal tract 

A complete theory incorporating all the above effects is not yet available 
The general approach in building an acoustic theory is to make various 
simplifying assumptions One of the simplest descriptions is to model the vocal 
tract as a tube of non-uniform, time-varying cross-section For frequencies 
corresponding to wavelengths that are long compared to the dimension of the 
vocal tract, plane wave propogation along the axis of the vocal tract tube is 
assumed A further simplifying assumption is that there are no losses due to 
viscosity or thermal conduction either in the bulk of the fluid or at the walls of 
the tube Such a model is due to Portnoff 1781 

For more complicated behaviour; models were proposed [74,801 to account for 
viscous friction between the air and the walls of the tube, heat conduction 
through the walls of the tube and vibration of the tube walls A new set of the 
equations of motion from the first principles is extremely difficult because of 
the frequency dependence of the losses Hence, the approach is modify the 
frequency domain representation of the equations of motion to account for the 
above effects The effects of wall vibration is more pronounced than those of 
viscous friction and thermal conduction However, viscous and thermal losses 
increase with frequency and have the greatest effect in the high frequency 
resonances 

Another effect is due to radiation at the lips The models referred to 
above assume the pressure at the lips to be zero at all times In reality, however, 
the vocal tract tube terminates with an opening between the lips Hence, a 
reasonable model should consider the lip as an orifice in a sphere which then 


acts as a radiating surface with the radiated sound being diffracted by the 
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spherical baffle that represents the head Again, the resulting diffraction 
effects are complicated and difficult to represent 

It IS thus seen that most of the observed effects are difficult to 
incorporate in physical models derived from first principles In such a situation, 
one resorts to phenomenological models where the observed behaviour is 
approximately modeled using linear functional representations that are not 
related to the physical laws governing the system In such a representation the 
system output has the desired speech like properties when controlled by a set of 
parameters somehow related to the process of speech production Some of these 
model only a class of speech signals eg vowel modeling using extracted formants 

As shall be seen in the following section most of the phoneme utterances 
are chaotic in nature in the sense that they have positive second-order entropy 
Similar analysis was done on stationary interval segments of an uttered signal 
[43] This means that the vocal system is a non-linear system because only such 
systems are capable of displaying chaotic behaviour In this sense, modeling 
speech signals using the constraint of linearity may be far from optimum The 
following section discusses some practical aspects of finding the correlation 

dimension and second-order entropy from observed data 

3.3 Estimation of Dimensions and Entropies from Scalar 
Time Series .* Practical Aspects and Results 

As mentioned in the previous section, some utterances eg fricatives are 
produced by a turbulent flow of air through a narrow constriction These are 
modeled using random excitation, while in other utterances, eg vowels, one 
searches for regular behaviour One of the offshoots of the study of chaotic 
behaviour has been the proposal of a variety of dimensions that can be used to 
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get an idea of the minimum number of independent variables needed to model 
observed behaviour For example, if the dimension is found to be 2 23, then the 
minimum number of independent variables needed is the next higher integer which 
in this case is 3 If the dimension is found to be large, then the number of 
independent variables needed to model the system would also be large In such 
situations stochastic modeling schemes may be preferred Similarly, the entropy 
of a particular trajectory gives an idea of its nature eg for regular behaviour, 
the Kolmogorov entropy K^^ = 0 For chaotic behaviour, Kj^ > 0, while K^ — f oo 
signifies random behaviour The inverse of the entropy gives an idea of the 
length of time upto which predictions can be carried out Thus, for regular 
behaviour, one can predict for infinite time while for random behaviour, the 
prediction time tends to zero 

Amongst the three dimensions and the corresponding entropies presented in 
Chapter 2, the correlation dimension D 2 , and the second-order entropy K 2 aire 
most frequently used because of their ease of computation They use the 
attractor reconstruction theorems presented in Chapter 2 The theorems ensure 
that it IS only necessary to observe one variable of the system evolving in time 
It IS not necessary to measure all the n variables, k=0,i, ,n-i In fact, it is 
also not necessary to know the value n (1 e , the dimension of the real space) to 
estimate the attractor dimension The measurement of a single variable y^^, i=0,l, 

IS sufficient because it contains all the relevant information about the other 
<n-i) variables in d*^y/dl*^ 

An analytical procedure that required the knowledge of all the n variables 
would be impractical However, the underlying idea is to reconstruct a d- 
dimensional vector-time series Xj^, i=0, i, from the observed variable y^, 1 = 0, i. 


using delays as follows 
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Xq = C yg wm y2n\ y<d-i) m 
^1 “ ^ *=^1 yi+m '^i+Zm '^l+(d-l) m 

Xi = t Wi WjL+fn yi+2m ^i+<d-i) m ^ (3 3 i) 

There is no hard and fast rule for selectng the value of the delay m in 
equation (3 3 i) above Some guidelines have been proposed which are in some 
sense optimal The central idea in choosing the proper delay is that the 
coordinates of the reconstructed state-space be as uncorrelated as possible A 
simple procedure to choose m would be to look for the decay or the first zero in 
the autocorrelation function < yi+rn ^ This is often unacceptable because the 
time series may have periodicities such that the autocorrelation function 
oscillates for quite a few delays Also zero autocorrelation means only second- 
order uncorrelatedness Another procedure to find m is to use the delay 
corresponding to the first minima of mutual information [56,581 

For information theory to apply, the probabilities of messages considered 
must exist The application of information theory to strange attractors is 
justified because they are ergodic and have well-defined asymptotic probability 
distributions Thus, the probabilities of the messages exist and long time 
averages converge to probabilites The following discussion briefly reviews the 
mutual information concept as applicable to the problem of state-space 
reconstruction 

Consider a process in which messages are being sent Let S denote the 
system which consists of a set of possible messages s^^, S 2 , , Sn and associated 

probabilities PgCSj^), Ps(S 2 ), , Ps®n^ The average amount of information gained 

from a measurement that specifies s is the entropy H of a system, 



In the present problem, one is interested in measuring how dependent the 
values of sre on yn By making the assignment Cs,ql = Cyniyn+k^' can 

consider the general coupled system (S,Q) Then HCQ/s^) denotes the uncertainty in 
the measurement of q given that s was found to be s^ HIQ/s^ is given by 

H (Q/s^) = - log [ Pq/£ ( Oj/s^) 3 <3 3 3; 

= - S I^Psq <Sj^,qj)/Ps<Sj^>3 log [Psq<s^,qj)/Ps(s^)] (3 3 4) 

where Pg/g(qj/s_^) is the probability that a measurement of q will yield q, given 
that the measured value of s is s^ 

Similarly, H (Q/S) denotes the average uncertainty in the measurement of 
'=^n+k yk measured It is given by 

H (Q/S) = S Ps H <Q/s^) 

1 

= - !Z Psq log [ Pgq (s^, q.) / Pg ( 5 ,)] 

i»3 

= H (S,Q) - H (S), (3 3 5) 

where 

H <S,Q) = - Ps,q 1*^9 ^ Psq ^ (3 3 6) 

1(3 

H (Q) denotes the uncertainty of q in isolation and H (Q/S), the uncertainty of q 
given a measurement of s So the amount that a measurement of s reduces the 
uncertainty of q is 

I (Q,S) = H (Q) - H (Q/S) 

= H <Q) + H (S) - H (S,Q) 

= I (S,Q) (3 3 7) 

This IS called the mutual information If all the logs are to the base 2, 
then H is in units of bits Mutual information provides an answer to how many bits 
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on th 0 average can be predicted about q given a measurement of s Finding the 
value of k at which minimum mutual information of the system (S,Q} is observed, is 
in a way finding the delay k at which the values of k are least dependent 

The results of the application of this technique to phoneme time senes 
will be presented after a description of the experimental set-up 

The experimental set-up consisted of an 8-bit analog to digital converter 
which sampled the intensity of all phoneme utterances ( in our case the phonemes 
of the IPA) at 20 0 kHz and stored them in different files The data from each of 
these phoneme time series was used for the diagrams and calculations 

Figures 3 2 - 3 iO, part (c) show the reconstructed phase space trajectories 
vs yn for phonemes /u/, /4il/, /r /, /m/, /-f/, /v/, /s/, /k/ and /-tj/ at k=2 
Successive points on the trajectory have been joined by straight lines assuming 
that the acutal trajectories can be approximated thus Figure 3 ii shows the 
trajectories of vowels /u/, /U/, /o/, /d/ and /a/ using k=2 It shows the slow 
change in overall shape from one phoneme trajectory to the next 

The first minima of the mutual information was calculated for all phoneme 
time series to obtain the best delay k For 30 phonemes, this was observed at k=i, 
for 9 phonemes it was observed at k=2 while for the remaining it was observed at 
k=4 


As discussed in Chapter2, the correlation or second-order dimension D 2 is 


given by 


where 


Do = lim 
r-*0 


log C (r) 
log r ' 


(3 3 8) 


C (r) 


lim 4 § e ( r - IX, - XJ ) 


(3 3 9) 


0 IS the Heaviside function 6(x) = 1 for x > 0 and 6(x) = 0, otherwise The 
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j/, type - vowel, sampled at 20 0 kHz 
ries, b) First 160 points of a 1024-point 
5 ints. In all the three cases, adjacent 
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Figure 3.4 - All graphs for phoneme /r/, type - semi-vomt, sampled at 20.0 
kHz a) Part of the time-series b) First 200 points of a 1024-po(nt DFT, c) 
Phase-space plot using 900 points 
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Figure 3.6 - All graphs for phoneme /m/, type - nasal, sampled at 20.0 kHz a) 
Part of the time-series b) First 200 points of a 1024-point DFT, c) Phase-space 
plot using 600 points of the time-senes 
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Figure 3.6 - All graphs for phoneme /f/, type - unvoiced fricative, sampled 
at 20 0 kHz. a) Part of the time-series, b) First 200 points of a 1024-point DFT, 
c) Phase-space plot using 250 points 
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Figure 3.7 - All graphs for phoneme 
20.0 kHz. a) Part of the time-series 
Phase-space plot using 500 points 


hi typo - voiced fricative, sampled at 
, b) First 100 points of a 1024-point DFT, c) 
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Figure 3.8 - All graphs for phoneme /g/, type - voiced stop, sampled at 20 0 
kHz a) Part of the time-series, b) First 100 points of a 1024-point DFT, c) 
Phase-space plot using 600 points 
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Figure 3.9 - All graphs for phoneme /k/, type - unvoiced stop, sampled at 
20 0 kHz a) Part of the time-series, b] First 100 points of a 1024point DFT, c) 
Phase-space plot using 600 points 
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Figure 3.10 - All graphs for phoneme /tj/, type - unvoiced affricate, 
sampled at 20.0 kHz a) Part of the time-series, b) First 300 points of a 1024- 
point DFT, c) Phase-space plot using 250 points 
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d) 



e) 


Figure 3 11 - Phase-space plots y(n+k) vs. y(n) using k=2 for various vowels a) 
Phoneme /u/, b) Phoneme /U/, c) Phoneme ^/, d) Phoneme h! and e) Phoneme 
/a/ Successive points on the trajectory have been joined by straight lines 
assuming that the actual trajectories can be approximated thus The overall 
shape of the trajectories change slowly in the order of phonemes above 
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function C (r) counts the number of pairs of those points with a distance IX^ - Xjl 
less than r 

In practice, the dimension ct of the reconstructed vector time series X^, is 
increased from 1 For each value of d, X^ is used to find C (r) for decreasing r 
The hypothesis in finding the correlation dimension is that C (r) scales as a 
power law with r i e , 

Dn 

C <r) oC r ^ (3 3 iO) 

Takers' theorems can be used to show that if d is sufficiently large, then 
the correlation dimension D 2 of the original system (1 e , the actual n-variable 
system from which a single variable is measured) and the reconstructed d- 
dimensional system will be the same As d is increased from i, D 2 will also 
increase until it reaches its correct value Increases in d beyond this point will 
not affect D 2 beyond errors, although the calculations become increasingly slow 
as d IS increased Then we plot the correlation dimension versus the 
reconstruction or embedding dimension If the curve keeps growing (1 e , I>2 
continues to increase with d) then the system approaches truly random behaviour 
with the limits of experimental observation 

This procedure even provides a scheme for detecting a low-dimensional 
attractor that is contaminated by noise C261 If there is noise of magnitude f 
added to deterministic data, a plot of log C (r) versus log r will have a knee at f 
Figure 3 12 (a) shows the log C<r) vs log r plot for increasing d (1 e , d=i, 2, , 

18) for the phoneme /©/ The slope above the knee will give the correct dimension 
for the deterministic system which in this case is the vocal system The slope 
below the knee will give the dimension corresponding to the noise Figure 3 12(b) 
shows the plot D 2 vs d for the same phoneme The saturated curve shows that D 2 
for the vocal tract configuration during the utterance of the phoneme /©/ is 2 nx 


007 
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Similar to the correlation dimension D 2 . the second-order entropy can be 

defined in terms of the correlation function C (r> as follows 
_ , . 1 Cj (r) 

Kp = lim_ lim ~ log _ ° (3 3 il) 


r-*0 d-'te T 


Cd^l 


where (r) is C (r) given by equation (3 3 9) for the embedding dimension d, and t 
IS the time difference between succesive points on the reconstructed vector time 


series which in our case is 0 05 ms Thus, <2 can be estimated from the vertical 
distances (at the same values of r) between the curves belonging to successive 
dimensions d in the log C(r) vs log r plots for increasing d (eg Figure 3 i2(a) ) 
Figure 3 12(c) shows the plot between log t Cd (r)/ (r) 3 for increasing d 

for the phoneme /©/ The indicated values for each particular d was calculated 


from the mean value of (r)/ (r) over the linear range of r above the knee 

(1 e , above the noise amplitude) from figure 3 i2(a) The limiting value of 
T ^logCCjj(r)/C(j_j_ j^(r)3 for large d gives the second-order entropy K 2 which in this 
case IS <0 8z± 3iz 


Table 3 1 shows the correlation dimension D 2 while Table 3 2 shows the 
second-order entropy K 2 for all phonemes of the IPA Further experimental 
details and results are discussed below 

1 Because of the constraint that the vocal tract changes configuration for non- 
continuant phonemes and even otherwise minor changes occur, we had to base our 
calculation on N=500 data points For the calculation of the correlation function, 
a large value of N is required Therefore, we sampled the phonemes at a 
relatively large sampling rate of 20 0 kHz to allow us to use 500 data points 
assuming that the vocal tract configuration does not change for 25 ms 

2 The log C(r) vs log r plots for different embedding dimensions D have a 
sufficiently large linear range to arrive at conclusive results The dimensions 
and entropies given in Table 3 i and 3 2 respectively, are not contaminaleiby low 
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Phoneme 

Type 

Correlation Dimension 

/!/ 

Vowel-Front 

3 10 ± 0 19 

/I/ 

“do- 

2 23 + 0 41 

/e/ 

-do- 

2 50 + 0 10 

/3/ 

-do- 

2 74 ± 0 28 

/ae/ 

-do- 

2 00 ± 0 23 

/a/ 

-do- 

2 60 + 0 10 

/u/ 

Vowel-Back 

2 00 + 0 10 

/U/ 

-do- 

1 74 + 0 13 

/o/ 

-do- 

1 70 ± 0 18 

/o/ 

Vowel - Mid 

1 88 -<■ 0 05 

/a/ 

-do- 

2 06 ± 0 04 

/A/ 

-do- 

2 07 + 0 14 

/&/ 

-do- 

2 20 ± 0 10 

/3V 

-do- 

2 00 + 0 20 

/■('/ 

-do- 

2 42 ± 0 08 

/w/ 

Semivowel-Liquid 

4 44 + 0 14 

/V 

-do- 

6 70 + 0 07 

/!/ 

-do- 

1 86 + 0 01 

/r/ 

Semivowel-Glide 

1 42 ± 0 24 

/J/ 

-do- 

3 23 + 0 21 

/al/ 

Dipthong 

3 40 ± 0 08 

/aU/ 

-do- 

2 11 + 0 21 

Al/ 

-do- 

2 70 ± 0 10 

/b/ 

Voiced Stop 

2 90 ± 0 45 

/d/ 

-do- 

2 50 ± 0 64 

/s/ 

-do- 

2 56 + 0 12 

/p/ 

Unvoiced Stop 

2 28 ± 0 64 

/t/ 

-do- 

2 90 + 0 19 

/k/ 

-do- 

2 69 ± 0 39 

/v/ 

Voiced Fricative 

5 70 ± 0 45 

/z/ 

-do- 

4 99 + 0 31 

/^/ 

-do- 

2 65 + 0 08 

/5/ 

-do- 

3 17 ± 0 29 

/f/ 

Unvoiced Fricative 

5 80 + 0 45 

/s/ 

-do- 

5 50 ± 0 27 

/©/ 

-do- 

2 17 + 0 07 

/T/ 

-do- 

7 90 + 0 40 

/as/ 

Affricate 

1 64 ± 0 18 

/t5/ 

-do- 

6 30 ± 0 10 

/h/ 

Whisper 

5 79 ± 0 28 

/m/ 

Nasal 

4 11 + 0 70 

/n/ 

-do- 

2 72 + 0 37 

Al/ 

-do- 

2 76 ± 0 20 


TABLE 3,1 - Correlation dimension D 2 for all phonemes of the IPA The phoneme 
utterances were sampled at 20 kHz using 8-bit ADC N=500 data points was 
used The error ranges were calculated from the extrapolation of the worst 
possible slopes of the linear range of log C(r) vs log r curves for each phoneme 
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Phoneme Type Second-Order Entropy 


A/ 

Vowel-Front 

7100 ± 2064 

/I/ 

-do- 

5140 + 1712 

/e/ 

-do- 

6260 + 290 

/3/ 

-do- 

8480 + 1768 

/ae/ 

-do- 

4729 ± 968 

/a/ 

-do- 

5230 + 1480 

/u/ 

Vowel-Back 

2600 ± 1084 

/U/ 

-do- 

3898 + 688 

/o/ 

-do- 

1804 ± 48 

/3/ 

Vowel - Mid 

1280 + 578 

/a/ 

-do- 

1760 i 1960 

/A/ 

-do- 

6740 + 392 

/£>/ 

-do- 

6370 + 400 

/jy 

-do- 

6005 + 2124 

/t/ 

-do- 

4830 + 1824 

/w/ 

Semivowel-Liauid 

9446 + 3316 

M/ 

-do- 

6474 + 480 

/!/ 

-do- 

5010 + 768 

/r/ 

Semivowel-Glide 

790 ± 1252 

/}/ 

-do- 

6310 + 888 

/al/ 

Dipthong 

7110 + 2496 

/aU/ 

-do- 

7840 + 1840 

/*)(/ 

-do- 

5801 ± 824 

/b/ 

Voiced Stop 

8665 + 2116 

/d/ 

-do- 

7100 + 880 

/g/ 

-do- 

6228 + 1732 

/p/ 

Unvoiced Stop 

480 ± 3776 

/t/ 

-do- 

7310 + 2212 

/k/ 

-do- 

10300 ± 1328 

/v/ 

Voiced Fricative 

11910 ± 8100 

/z/ 

-do- 

12200 ± 1152 

/§/ 

-do- 

7400 + 2112 

/3/ 

-do- 

7406 + 1800 

/f/ 

Unvoiced Fricative 


/s/ 

-do- 

36735 + 130 

/©/ 

-do- 

6082 + 312 

/!/ 

-do- 

25995 + 1310 

/d3/ 

Affricate 

600 + 2240 

AJ/ 

-do- 

* 

/h/ 

Whisper 

7362 + 5020 

/m/ 

Nasal 

7717 ± 716 

/n/ 

-do- 

7160 ± 180 


-do- 

5906 + 400 


TABLE 3.2 - The second-order entropy K 2 for all phonemes of the IPA uttered 
by an adult male N=500 points were used for the calculations The error ranges 
correspond to the worst possible slopes of the linear range of the log C(r] vs 
log r plots 
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amplitude noise The presence of low amplitude noise was distinguished from the 
actual data in the log C(r) vs log r plots as discussed earlier in the section The 
error bounds given for the estimated dimensions and entropies are based on the 

a 

worst slopes from the linear range of the log C(r) vs log r plots for different d 

3 Simulations done for N = 1000 and 1500 on sustained utterances of some of the 
continuant phonemes showed very little variation of the dimension 

4 Most of the attractors of vowel time senes have dimension less than 3 0 
Consonants were generally found to have higher dimension than vowels This 
shows the more complex behaviour of consonants particularly fricatives as can 
also be seen from a comparison of phase space plots of fricatives with those of 
vowels Studies of utterances of 3 adult males showed that the dimensions of the 
unvoiced phonemes and their voiced counterparts are nearly the same ThuSj the 
role of excitation is not brought out by the dimension invariant, assuming that 
the vocal tract configuration remains the same for both 

5 A look at the second-order entropy K 2 given in Table 3 2 shows that for almost 

all phonemes, it is positive within the error range Since > 0 is a sufficient 

condition for chaos and K 2 < K^, this means that most phoneme time series are 

-1 

chaotic in nature For almost all vowels, K 2 lies in the range 1000 s" to 8000 
s~^, while for most consonants, it lies in the range 6000 s*^ to 25 000 s“^ and is 

concentrated in the region of 7000 s"^ to 12 000 s"^ 

3.4 Concluding Remarks 

The main point of this chapter was to present the results of the 
dimensional analysis of attractors and entropies of the trajectories 
reconstructed from all phoneme utterances of the IPA This was done after an 
introduction of the vocal tract which is the underlying dynamical system, and all 
the classes of phonemes which form the basis of our analysis throughout the 
thesis It was also pointed out that a model of the vocal tract that incorporated 
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all the observed effects is not yet available The fact that we do not know the 
dimension of the state space in which the vocal system actually resides does not 
hamper our investigation of the dimensions of the attractors in which the 
trajectories eventually reside This is because of the powerful theorems of 
Takers discussed in the previous chapter Finally, analysis showed that the 
underlying attractors of phonemes are low dimensional This means that a few 
variables are needed to model the utterances deterministically using non-linear 
functional representations and makes a strong case for their study The next 
chapter discusses the signal modeling problem in this framework 



CHAPTER 4 


SIGNAL MODELING : THEORY AND RESULTS OF 
APPLICATION TO SPEECH SIGNALS 

As mentioned in Chapter 1, it is now known that apparently random 
behaviour can arise from simple non-linear systems Until recently, the only tool 
to analyse such behaviour was based on Kolmogorov's theory of random processes 
Alternatively, one may postulate that a time series is the evolution of a 
dynamical system because most natural phenomena can be modeled as such Coupled 
with this the assumption that randomness arises out of chaos rather than 
intractable complexity leads to a fundamentally different approach to signal 
modeling which will be the main theme of this chapter In Chapter i, a brief review 
of the auto-regressive and state-space modeling schemes was presented Signal 
modeling in the present framework differs from autoregressive schemes in two 
main respects Firstly, there is no assumption of a stationary random process 
generating the time series and secondly, the present modeling scheme uses non- 
linear functional representations as opposed to the linear ones in the auto- 
regressive schemes The present modeling scheme also differs from the 
deterministic state-space modeling schemes in the following respects Firstly, 
there is no attempt to linearize trajectories using the concept of nominal 
trajectories because complex behaviour can arise out of simple non-linear 
systems which one tries to model in the present framework Secondly, the 
traditional state-space modeling schemes require the knowledge of the variables 
involved or the state-space itself whereas in the present scheme, the observation 
of a single variable allows us to reconstruct the state-space 

The chapter begins with the theoretical framework for the signal modeling 
problem Then some practical aspects of the problem are discussed Next a 
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discussion on the various types of functional representations is given followed 
by the two approximation techniques - global and local which are used to obtain 
the parameters of the model The local approximation technique is not practical 
from the storage and transmission point of view because of the requirement of a 
prohibitively large model order but it has better prediction properties than the 
global approximation technique We, thus, propose and study a compromised 
overlapping neighbourhood (CON)-local approximation technique that reduces the 
model order and yet tries to retain the better prediction properties of the local 
approximation technique The prediction properties of the global and CON-local 
approximation technique are compared with the LPC using speech data in the form 
of phoneme utterances Finally, the numerical complexity of the two approximation 

techniques is compared with the LPC 

4.1 The Theoretical Framework 

Let Xj,, 1 = i, 2, be a scalar time series evolving in time Consider it to 

be the observed variable of a dynamical system with a map or vector field V* M -4 

M, M being a compact manifold on which the phase space of the system is located 

That IS, there exists an obseri'able on the dynamical system which is a smooth 

function y M — > R such that x^ = y i _{ i < oo, where is a point on the 

k 

manifold corresponding to which x^^ is the observed variable Use M — > R 

as follows 

^^1^ = ^ y<v><Zi)), y<v^<Zi)), , 3 <4 1 1> 

to embed the attractor in real space F Takens' theorem requires for sufficiency 

k* \t 

that k 2. 2m + 1, where m is the dimension of the manifold M Let f F — ♦ F be a 

smooth map such that <1^3 = f^ ^((p,y) ^^0^’ ^ describes the embedded 

k 

trajectory in R The signal modeling problem then is to construct a smooth map 
^ k k 

f R"- using only a finite number of iterates i 1 N, for which 
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1 i i i N-1 

A Simplification of notation is now in order As the signal modeling problem 
deals solely with the embedded trajectory in henceforth be 

replaced by which is a vector of the k delayed elements of the observed time 
series x^, i = 1, 2, i e , 

Xi “ ^ ^1 *1+1 *i+k-i ^ (4 12) 

For k = 1, the signal modeling problem reduces to finding a good 

approximation to f For k > 1, this becomes a problem of geometrically fitting k 
smooth functions i^^jf F — ♦ F through the data points ( ^ ^ 1 ^ 

= 1, , k (itj denotes the projection onto the i^*^- coordinate) T is , thus, a 

predictor 


To measure the efficacy of f as a predictor, the normalized mean square 
error, or the prediction error is used as follows 


_2 ^ < I - f (X,) I" > 
< I X^ - < X^ > 1^ > 


(4 1 3) 


Thus, E = 0 for perfect prediction and E = 1 for predictions made using 


Xi+i = ( ^n > 

4.2 Practical Considerations in Model Building 

When a signal is measured over a period of time, it is assumed that the 

underlying dynamics of the system does not change Considering the signal x^, i=i, 

2, .. to be the observed variable of the system, the dimension of the attractor is 

calculated Usually, one calculates the correlation dimension D 2 because of ease 

of implementation Next, the state-space reconstruction process involves the 

k 

embedding of the observed signal in F That is, one must reconstruct the 
trajectory in a k-dimensional state-space from a single variable using delays as 
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given by equation (4 1 2) While Takens showed that k = 2r+l is sufficient where r 
IS the dimension of the manifold containing the attractor, a necessary condition 
for determinism is k 2 ^^2 practical situations, k = 2r+l is rarely needed 
Moreover, if the dynamics can be described in a lower dimensional real space, it 
provides a lower model order description which is advantageous from the 
numerical computation viewpoint particularly in real time operations In equation 
(4 i 2), the delay between successive samples is equal to one In principle, this 
delay can be arbitrary In practice, however, if the delay is too small the 
coordinates become singular so that Xj ~ If it is too big, then chaos makes 

Xj^ and causally disconnected A more systematic procedure based on mutual 
information was discussed in Chapter 3 The application of this method on all 
phoneme time series showed that most of them gave a minima of mutual information 
at a delay equal to one Hence, in all subsequent work we used this delay in 

reconstructing the state-space 

4.3 Functional Representations 

Once the stale-space reconstruction process is complete, the next task is 
to fit a model to the data For the predictor T to approximate chaotic dynamics 
as well, one must consider non-linear models From an infinite variety of such 
representations, an adhoc choice is made in the absence of greater theoretical 
understanding at the present juncture Descriptions of some such representations 
due to Farmer and Sidorowich C673 follow 

1 Polynomials * They are the most widely used forms of representation partly 
because obtaining their parameters is a linear problem using least squares 
criterion Linear forms are used in auto-regressive and moving-average models 
while the more general m^^-degree non-linear polynomials are used in 
deterministic state-space modeling techniques, all of which have been reviewed in 
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Chapter 1 


An m 


th 


- degree d- dimensional polynomial is 


Am (X£, ,x^) = 5^ x^j <4 3 1) 

1^=0, .1^=0 i d 

where ^ 1 + 12 + +^d 1 number of parameters a^^ a^^ is ■a-d'” Fitting 

so many parameters for large m and d is impractical Also, polynomials have the 
disadvantage that they do not extrapolate well beyond their domain of validity 

2 Rational Approximation - polynomials They 

extrapolate better than polynomials Also, fitting parameters by least squares 
criterion is linear 

3 Radial Basis functions - I’ney were recently suggested in the context of non- 
linear modeling by Casdagli [631 In their simplest form they depend only on the 
distance between points Thus, 


R (X) 


xn < II X - Xn 11 X 


<4 3 2) 


n=i 

where II 11 is the Euclidean norm and n is the label attached to points in the time 
series The coefficients \n chosen to satisfy the interpolation conditions 
Xp+i = R <Xn^ A special case of radial basis functions are thin plate splines 
4 NOUral Notworks - They are another class of functional representation A 
standard feed-forward neural net with two hidden layers can be written as 


*n+i = 

? 

k 

■ ^0 


(4 3 3) 

Zk = 

tanh ( 

Wj yj 

3 

- ak) 

(4 3 4) 

yj = 

tanh ( 

E Wj X, 

- aj) 

(4 3 5) 


1 


where yj and Z|^ are the values of the neurons in the two hidden layers and are 
the input neurons The neurons are the coordinates in state-space A major 
disadvantage is that parameters cannot be fit by solving a linear problem As a 


result, fitting parameters takes several orders of magnitude more computer time 
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This IS the first of the two approximation techniques to be discussed 
After the reconstruction of state-space, the chosen functional representation is 
used to approximate the scattered data points using one of the approximation 
techniques The global approximation technique uses all the data points in the 
state-space to gel the approximating function T The criterion generally used to 
get the parameters is the minimization of total squared error which reduces to a 
linear problem for polynomial and rational representations 


Thus, given a block of observed data x^, i = i, , N, one reconstructs the 
trajectory X^, i = i, , N-k+i, in a k-dimensional state-space, where k 2 ^2' ^2 
being the correlation dimension is given by 


Xi = [ x^ x,^i x,^2 


^i+k -1 


(4 4 i) 


The global approximation technique using the least squares criterion is 


then a problem of finding the parameters of the approximant T, so that 
^ 1+1 = f < ), i < 1 _< N-k 


(4 4 2) 


N-k+i 

~ S * ’^3^1 ■ '’^3^1 i - i-i /k 
1=2 


(4 4 3) 


which IS the total squared error for the 3 ^”- coordinate, is minimized for each 
coordinate 3 to obtain the parameters of the k approximating functions T 


In our particular case, we have studied the performance of polynomial 
representations Moreover, we have used a delay m=i for reconstructing the 
state-space as given by equation (4 4 i) Therefore, we have 

^3 ^i+i = "^j+i Xj, , 3 = 1, , k-i (4 4 4) 

Me preserve the above structure in the approximant T This constraint is 
introduced to reduce the model order from the data compression point of view 
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Thus, for polynomial representations, we have 


^^3 ^1+1 

» itj T (X^) ■ X^, 3 * i, / k-1, 1=1, 

, N-k 

(4 4 5) 

’^k ^1+1 

= ii|^ f iXj) 




= c + ^ aj itj X, + ^ i: aj irjX it X^ 
1=1 1=1 j=l 


To obtain the parameters of the approximating function f using least squares 


criterion requires the solution of the following simultaneous linear equations 


N-k _ N-k 

1=1 1=1 
N-k _ N-k 

2 ’^k^i+l S Ttj^X^^i itjXj , 3 = 1, , k (4 4 8) 

1=1 1=1 
N-k N-k 

2 TtjXj TtjX, = 5: TllX, itjX,, 3=1, , k, 1=1, , 3 

1=1 1=1 

(4 4 9 ) 


where ^k^i+1 given by equation (4 4 6) 

The following section discusses the results of the global approximation 
technique as applied to speech signals in the form of phoneme utterances 


4.5 Results and Discussion of the Global 
Approximation Technique 


The global approximation technique using state-space reconstruction 
method discussed earlier provides us with a one-step predictor The performance 
of this prediction scheme was compared with the LPC The covariance method of 
the LPC employing a standard algorithm using Cholesky decomposition was used 
for comparison The comparison was in terms of the prediction error E given by 
equation (4 i 3), for the same model order p and blocklength N 

For polynomial approximating functions, the model order p is obtained as 
follows Let the embedding dimension be k, i e , the tra 3 ectory X^, i = 1, 2, . N-k, 

in the reconstructed space is k-dimensional For the specific constraint 
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described in Section 4 4, which we have studied^ the model orders for various 
degrees of approximating polynomial T are given by 

1 Linear p = 1 + k, 

2 Quadratic p = l + k + kx(k + i)/2, 

3 Cubic p=l + k + kx<k + i)/2 + k x <k + 1) x <k + 2)/6, 

etc 

Thus, given the degree of the polynomial and the embedding dimension, the 
model order is fixed 

The advantage of this modeling scheme over other schemes lies in the 
proper choice of the embedding dimension If the attractor dimension is D, then 
the choice of embedding dimension k is governed by i) k 2 D, k being an integer 
(necessary condition to preserve one-to-one relationship), ii) k 2 2r+i, where r is 
the dimension of the manifold containing the attractor (sufficient condition) 

However, k 2 2r+i is generally not required for most purposes For 

prediction purposes, one can increase k by i to get an idea of the variation of 

prediction error For phonemes which have D~ 2 0, the greatest drop in 
“"2 

prediction error E was observed when k was changed from 1 to 2 (for 
representative examples, see Table 4 i) Thereafter, only minor drop in E^ was 
observed as k was increased even further For phonemes with larger values of D 
eg phoneme /z/, (Table 4 i (c)), the drop in E^ continued until k ~ D This method 
has also been suggested for finding the required value of embeddaA dimension k 

For purpose of reconstructing the state-space, the embedding dimension k 
was chosen as follows If D < 3 0, then k=3 was used, else, ks D' was chosen where 
D' IS the next higher integer after D For real time use of this modeling scheme, a 
common value of k chosen apriori is desirable This is because the calculation of 
attractor dimension for each block of data is not only time consuming but also 
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% Drop in Prediction Error 

Embedding Dimension Prediction Error from previous Embedding Dimension 


1 

2 00 X 10"^ 

- 

2 

3 73 X 10'^ 

436 •/. 

3 

2 52 X 10'^ 

48 •/. 

4 

2 27 X 10"^ 

11 •/. 

5 

2 17 X 10'^ 

4 6 •/. 

6 

2 20 X 10"^ 

- 1 36 •/. 


< a ) 


Embedding Dimension 

Prediction Error 

% Drop in Prediction Error 

1 

8 88 X 10"^ 

- 

2 

i 99 X 10'^ 

346 •/, 

3 

1 81 X 10“^ 

9 9 •/. 

4 

1 75 X 10'^ 

3 4 ■/. 

5 

1 59 X 10“^ 

10 1 •/. 


( b ) 


Embedding Dimension 

Prediction Error 

% Drop in Prediction Error 

1 

2 40 X 10"^ 

- 

2 

1 81 X 10'^ 

32 6 y. 

3 

1 32 X 10'^ 

37 i y. 

4 

9 80 X 10"^ 

34 7 y, 

5 

9 60 X 10"^ 

2 1 y. 

6 

9 25 X 10"^ 

3 8 y. 

7 

9 21 X i0‘^ 

- 0 44 y. 


( C ) 
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Embedding Dimension Prediction Error % Drop in Prediction Error 


1 

4 70 X iO‘^ 


2 

i 63 X 10"^ 

157 ■/, 

3 

1 73 X i0‘^ 

5 8 ■/. 

4 

1 67 X iO'^ 

3 6 •/. 

5 

1 60 X 10"^ 

4 3 •/. 


( d ) 



Table 4 1 The prediction error E with increasing embedding dimension k for 
blocklength N = 300 and quadratic polynomial for the following phonemes -a] 
Phoneme /u/, Correlation Dimension Da = 2 00 ± 0 10, b) Phoneme /a/, Da = 1 88 
± 0 05, c) Phoneme /z/, D 2 = ^ 99 ± 0 31, d] Phoneme /k/, D 2 = 2 69 ± 0^9 The 
greatest drop in prediction error is when one changes from k=l to k=2 for low 
Da For phonemes with higher D 2 eg c) the drop continues upto k ~ D 2 The 
aovantage of this modeling approach is when one chooses the proper k (See 
text on the choice of embedding dimension ) 
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partly manual ( in the selection of the linear range of slopes in log C(r) vs log r 
plots above the noise threshold) If k > D, a further increase of k leads to only a 
slight drop in prediction error On the other hand; if k < D, an appreciable 
decrease may occur If the degree of the approximating polynomial is m; then the 
number of parameters required to be fitted is which depends heavily on k 

K' fn* 

and m Hence, a blanket choice of k greater than the attractor dimension of all 
phonemes is costly in terms of computation As most phonemes have D 2 < 4 0, a 
compromise choice of k = 4 is suggested for speech modeling Thus, those regions 
where k > 4, may not be approximated very well 

In the comparison of the performance of global approximation technique 
with the LPC (covariance method), blocklengths N s 50, iOO, 200, 300, 400, 600 etc 
were considered For global approximation, using quadratic polynomial and k=3 
means parameter size p = iO In all 10 phonemes were studied of which 5 were 
vowels and 5 were consonants, including one each of voiced and unvoiced 
fricatives and stops These phonemes are /u/, /U/, /o/, /o/, /a/, /6/, /^/, /z/, /k/ 
and / g/ In all cases, both direct prediction (1 e , where the blocklength used for 
constructing the approximating polynomial equals the prediction length) and 
iterative prediction (where blocklength for constructing the approximating 
polynomial is less than the prediction length) were studied for different 
blocklengths N 

The general observations from the numerical computations are summarized 
below (also see figures 4 i and 4 2 for comparison of the error sequences due 
the LPC and global approximation techniques) - 

1 Choosing N 2 300 and p=i0, for vowels, LPC gave 2 to iO times (more often 2 to 
4 limes) more prediction error than the global approximation technique (Table 4 2 
gives representative examples) 



Blocksize 

Number of Pred Error 
Blocks using LPC 

Pred Error 
using 

global approx 

% Drop in Predictior 
Error with global 
approx technique 

50 

27 

4 94 X 10'^ 

2 11 X 10'^ 

2241 

100 

13 

113 X 10'^ 

2 44 X 10'^ 

363 

zoo 

6 

1 61 X 10'^ 

2 48 X 10'^ 

549 

300 

4 

7 21 X 10"^ 

2 51 X 10'^ 

167 



( a ) 



50 

16 

1 93 X 10 '^ 

9 41 X 10'^ 

1951 

100 

24 

3 23 X 10'^ 

1 24 X 10'^ 

160 

200 

14 

2 24 X 10'^ 

1 32 X 10’^ 

69 7 

300 

9 

1 51 X 10'^ 

1 44 X 10*^ 

4 9 

600 

4 

1 57 X 10"^ 

1 48 X 10"^ 

6 1 



f b ) 



50 

24 

2 47 X 10'^ 

5 38 X 10'^ 

359 

100 

12 

4 64 X 10'^ 

9 30 X 10"^ 

399 

200 

6 

2 02 X 10'^ 

1 16 X 10"^ 

74 1 

300 

4 

1 50 X 10"^ 

1 24 X 10'^ 

21 0 


( c ) 

Table 4 2 - Comparison of prediction error LPC against global approximation 
technique (quadratic polynomial) for specific cases a) Phoneme /u/, Correlation 
Dimension (L = 2 00 + 0 1, Embedding Dimension k = 3 Thus, parametersize used , 
p = 10 b) F^oneme /k/, D 2 = 2 69 ± 0 39, k = 3, p = 10/block, c] Phoneme /z/, 0^= 
199 i 0.31, k=5,p=:21/block ^ 
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2 Choosing 50 < N iOO, and p = 10 for vowels, LPC gave 2 to 5 times more 
prediction error than the global approximation technique 

3 Choosing N X 200 and p = 10, 20 etc , LPC gave 0 9 to 2 times the prediction 
error due to global approximation technique. Toy consonants 

4 Choosing 50 ^ N iOO and p = 10, 20 etc LPC generally gave 2 to 10 times more 
prediction error than the global approximation technique. Tor consonants 

5 For all phonemes, prediction error using linear polynomials was generally i 02 

to 2 0 times that using quadratic polynomials for the same blocks (see table 4 3 

for the specific example of phoneme /€/) 

6 We also compared the performance of the direct method against the iterative 

method The iterative method may be used when a lowering of the computational 

effort IS needed The prediction error due to the iterative method is i 02 to 2 0 

times that due to the direct method As expected, for the same prediction length 
(eg Np = 600) using successively increasing blocklengths for the approximating 
polynomial eg (Ng = 50, iOO, 200 etc ), the prediction error decreases T able 4 4 
shows the prediction error using both direct and iterative methods for different 
blocklengths Ng and prediction lengths Np for the phoneme /k/ 



Number of 

Bred Error using 

Bred Error using 

Blocklength 

Blocks 

Linear Polvnomial 

Quadratic Bolvnomia! 

50 

55 

1 87 X 10'^ 

1 85 X 10"'^ 

100 

29 

2 04 X 10"^ 

1 75 X 10"^ 

200 

14 

2 13 X 10"^ 

1 92 X 10'^ 

300 

9 

2 20 X 10’^ 

2 02 X 10'^ 

600 

5 

2 14 X 10'^ 

1 99 X 10'^ 

Table 4 3 : 

Comparison of the prediction error 

using different blocklengths 


the case of linear approximating polynomial vs quadratic polynomial for phoneme 
/8/, Correlation dimension Da = 2 65 + 0 08. Embedding dimension k = 3 was used 
Thus, parameter sizes foninear and quadratic polynomial approximations were 
p=4 and p=10 respectively 



Pred Length 

Pred Error 
using 

Direct method 

Blocklength to 
construct approx 
polynomial 

Pred Error using 
iterative method 

50 

2 55 X 10"^ 

50 


100 

1 96 X 10'^ 

50 

2 59 X 10'^ 

200 

3 22 X 10"^ 

50 

4 08 X 10"^ 

300 

4 29 X 10'^ 

50 

5 83 X 10“^ 

400 

4 63 X 10'^ 

50 

6 33 X 10'^ 

100 

1 96 X 10'^ 

100 


200 

3 22 X 10'^ 

100 

3 48 X 10'^ 

300 

4 29 X 10'^ 

100 

4 93 X 10"^ 

400 

4 63 X 10'^ 

100 

5 47 X 10"^ 

200 

3 22 X 10'^ 

200 


300 

4 29 X 10"^ 

200 

4 32 X 10'^ 

400 

4 63 X 10'^ 

200 

4 69 X 10'^ 

500 

5 14 X 10'^ 

200 

5 24 X 10"^ 

600 

4 63 X 10"^ 

200 

4 83 X 10"^ 

300 

4 29 X 10'^ 

300 


400 

4 63 X 10‘2 

300 

4 64 X 10"^ 

500 

5 14 X 10"^ 

300 

5 17 X 10"^ 

600 

4 63 X 10"^ 

300 

4 81 X 10'^ 


Table 4 4 . Comparison of prediction of prediction error for the direct and 
Iterative methods for different blocklengths Nn and prediction lengths Np for 
the specific case of phoneme /k/, Correlation Dimension Dn = 2 69 ± 0 39 using 
quadratic approximating polynomials 




vAfn/a LfUf 

. parametersjze 
respectively 


l/nej and 

^Blocklength used 
■ and 1 70 x 10' 


L_ 

O 

L_ 

L_ 

CD 


100-1 



and global approXatoTte1hnTqTe\ro3rilTfoTD^^ 

eO, parameters, ze = lO, predlbon t r ? ' 7®", Blodtength osed= 

respectively. are x 10 and 2 90 x 10'^ 


COM 
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The performance of the global approximation technique depends on the 
choice of the functional representation 7 While many practical choices of 
functional representations exist, some of which were discussed in Section 4 3, our 
study was limited to the case of polynomial representation as a first step in this 
modeling scheme For a good approximation f to the actual function f, 7 must 
follow the variations of f The dependence on the choice of approximating 
function IS reduced in the case of local approximation technique which will be 
described in the following section Also the better prediction properties of the 
global approximation technique over the LPC motivates us to study the local 
approximation technique which intuitively should give even better prediction 

properties 

4.6 Local Approximation Technique 

The basic idea here is to break up the state-space into local 
neighbourhoods and fit different parameters in each neighbourhood Thus, 
intuitively, local approximation should produce better fits for a given number of 
data points than global approximation, particularly for large blocklengths 

The use of nearest neighbour approximation in the context of modeling 
chaotic dynamics was suggested by Farmer and Sidorowich [66,671 Their initial 
results show that global representations cannot be used to decrease the 
prediction error beyond a certain point even by adding more parameters or data 
Local approximation technique provides a means of using a chosen representation 
efficiently in the sense that beyond a certain point, adding more neighbourhoods 
gives more reduction in prediction error than going in for more parameters and 
using higher degree representation 


There are three basic steps in the implementation of the local 
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Figure 4.3 - Implementing local approximation technique - Suppose one wants to 
predict the unknown state from known Xj^. Find the neighbourhood of X^ 
which in this case is denoted by black dots in the circle Choose a functional 
representation T and fit the parametrs to these points using, for example, the 
least squares error criterion To make a prediction, evaluate? at X^^ 

approximation technique in the neighbourhood of a point X in state-space 

1 Pick a local functional representation, 

2 Assign neighbourhoods, 

3 Find a local chart that maps the points in each neighbourhood into their future 
value To make a prediction, evaluate the chart at X The basic idea is 
representated in figure 4 3 

A simple way to assign neighbourhoods is to partition the domain into 
disjoint sets For example, one may use a rectangular grid Although this approach 
is convenient, it has the disadvantage that there is no overlap between the 
neighbourhoods and therefore no continuity between functions of one 
neighbourhood and another One way to overcome this problem is to introduce 
matching conditions between adjacent neighbourhoods This becomes a difficult 
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problem for data in mor« tfwn tiw c(i(fMin»ions - a Jiituation which «riif«» very 
often in our case 

An alternative that is more accurate than disjoint partitions and more 
convenient than implementing matching conditions is to overlap the 
neighbourhoods so that each function is constructed from a good set of 
neighbours A point is said to be in the neighbourhood of point X if I X-Y^ I < 

R< R being a fixed radius 

After getting the neighbourhood of point X, one fits the parameters of the 
chosen representation using the points in the neighbourhood For the case of 
polynomials, using the least square error criteria, the analysis is similar to the 
case of global approximation technique done in Section 4 5 

Farmer and Sidorowich classify local approximation techniques according 
to the order of the derivatives the errors depend on Suppose that the function f 
being approximated is a polynomial of degree m If one wants to approximate f by 

rw 

a function f that is not itself a polynomial of degree m, then in the ideal case, 
the errors are proportional to where 6 is the spacing between the data 

points The average spacing between N points uniformly distributed over a D- 
dimensional space is 6 ~ Calling q the order of approximation, then 

E ~ <4 6 1) 

rw 

where E is the r m s prediction error and in this case q = m+1 

Farmer and Sidorowich then argue that acheiving the ideal case where q = 
m+i IS difficult for large q, since in general, fitting a polynomial of degree m 
does not produce a fit that is accurate to order m+1 Hence, they use equation 
(4 6 1) to define the order of local approximation, taking the limit as N — ♦ oo and 


letting D be the information dimension 



85 


Q 


■ 

N-*oo 


D I log £ I 
log N 


(4 6 2) 


In general, q depends on D, f, the way one chooses the neighbourhoods and 
other factors 


Implementing the local approximation technique is more difficult compared 
to the global approximation technique If one uses disjoint neighbourhoods then 
the approximating functions have to be evaluated at each neighbourhood as 
opposed to one for global approximation Similarly, if one uses overlapping 
neighbourhoods, then the approximating function has to be evaluated at every 
point for predicting the next point While this is acceptable for 

applications like weather forecasting where the sole aim is to obtain better 
predictions, such an approach is impractical for applications like data 
transmission etc where the model order must be kept under control In the 
following section we propose a compromised overlapping neighbourhood (CON)- 
local approximation technique and compare its prediction properties with global 

approximation 

4 .7 A Compromised Overlapping Neighbourhood (CON) - 
Local Approximation Technique 

The proposed algorithm attempts to retain the better prediction properties 
of the local approximation technique and yet keeps the model order low for a 
fixed blocklength N The algorithm is similar to the local approximation technique 
except in the choice of neighbourhoods As seen in the previous section, the 
overlapping neighbourhood method is to find a set of points to point X such 
that I X “ Yj^ 1 < R, for fixed R The compromised overlapping method modifies this 
scheme in two respects Firstly, a certain minimum number of points NMINNEW are 
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included in each neighbourhood If for the chosen radius MRAD, the number of new 
points NMRAD in the neighbourhood is less than NMINNEW, then (NMINNEW - NMRAD) 
next nearest points are included in the neighbourhood Secondly, the same 
functional representation is used for all points in the neighbourhood Thus, for a 
given blocksize N, a maximum of CN/NMINNEWl functions are used for modeling the 
block, setting an upper limit on the model order 

The algorithm below gives the details of the implementation procedure with 

discussions at appropriate places 

The Algorithm 

5J£P J Read one block of data i = 1, , N 

STEP 2 Reconstruct the trajectory X^, 1 = 1 , , N-k+1 in state-space of 

dimension k 

STEP 3 in constants MRAD and NMINNEW MRAD is the minimum radius of the 

neighbourhood around a data point Xl NMINNEW is the minimum number of new 
points to be included in the neighbourhood By new it is meant that any previous 

neighbourhood did not contain the point 

STEP 4 Initialize an array TALLY C i N-k 3 with all I's The entry 1 in a 

location 

TALLY [13 indicates that Xj has not yet been considered for prediction 

STEP 5 J = O' 

u) J = J + 1, until TALLY [J3 = 0, 

"" st, 

J denotes the index of the i data point that has to be modeled for prediction 

STEP 6^^ COUNTER = 0, L = J + 1, 

II) COUNTER = COUNTER + 1, 
if TALLY [L3 = 0 then 

L = L + i until TALLY [L3 = 1 
SCHECK [COUNTER! = L, 

III ) if L = J + i then RCHECK = 1 Xj - Xl 1, else 

RCHECK = max C I Xj - Xl 1, I = SCHECK [13, , SCHECK [COUNTER! 3 
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IV) if RCHECK < MRAD, then 
L •» L + JL, goto ii) 
else proceed to STEP 7. 

STEP 6 finds e new point Xl which is approximately MRAD away from the 
first new point Xj in the neighbourhood. The new neighbourhood is created about 
the point Xl This step ensures that the maximum number of new points (i e , those 
not considered in any previous neighbourhood) are included See Figure 4 4 
STEP neighbourhood C X^ } of point Xl as follows 

All points C X^ 3 are in the neighbourhood of Xl such that 

either I Xl - X^ l< MRAD and NONEWPTS 2 NMINNEW, where NONEWPTS is the 

number of new points in the neighbourhoodj 

or if NONEWPTS < NMINNEW, then 

find (NMINNEW - NONEWPTS) new points CX^3 which are the next nearest 
neighbours of Xl in the usual sense i e , TALLY CI3 = 1 and 

i Xl - XJ < I Xl - Xj 1, for all Xj g C X^ 3 
1®^ POINT IN NEW 



Figure 4 4 New neighbourhood created about point and not Xj to include 
more new points in the neighbourhood This in turn helps in reducing the model 
order 
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STEP 8 store the neighbourhood value C 5 and succeeding points C } 

in matrix VAL 

11) TALLY Cl) js 0 for all i such that X^ is in the neighbourhood 
Thus, at least NMINNEW points are in the new neighbourhood 


STEP 9 Solve for the parameters of the approximating function f 
STEP 1 0 ^ points X^ in the neighbourhood 

STEP 11 CHK = true, 

for COUNT = 1 to N-k do 

if TALLY [COUNT) = 1 then CHK = false, 
if CHK = false then goto STEP 5 

else find the error sequence and compute the prediction error 


END OF ALGORITHM 


The prediction properties of this algorithm were compared with the global 
approximation technique The study was done on eight phonemes (4 vowels and 4 
consonants) for two different blocklengths ( N=600, iOOO) at different locations in 
each time series For the global approximation technique, blocklengths Nq were 
considered such that the model order for both techniques remained the same This 
was achieved as follows The total number of neighbourhoods formed = NBD (or 
equivalently, the number of functions fitted to the entire blocklength) were noted 
in the CON-local approximation method Since the same functional representation 
was used in global approximation technique with the same embedding dimension k, 
blocklengths Nq used for the global approximation technique were given by Nq = 
CN/NBD) The prediction error was computed over NBD blocks Thus, the comparison 
of prediction error was for the same blocklength N and model order p The 
results based on the observations may be summarized as follows 


In 80 y, of the oases studied, the CON - local approximation technique 
showed a further lowering of prediction error for the entire blocklength ( N= 600 
and iOQO) over the global approximation technique This improvement was upto 



89 


over iOO */, In the 20 % cases where the global approximation was better, the 
worst case was an increase in prediction error of 16 5 % 

Table 4 5 shows the comparison for representative blocks of all the eight 
phonemes studied It is seen from the table that the improvement in the case of 
vowels IS more pronounced than that for consonants Figures 4 5 and 4 6 show the 
error sequences for the CON - local approximation technique and the global 
approximation technique using identical blocklengths and model order 

A comparison of the prediction properties thus shows that the global 
approximation technique is invariably better than the LPC while in 80 V, cases the 
CON - local approximation technique is better than the global approximation 
technique in terms of prediction error This comparison is based on same 
blocklengths N and model order p Another important criterion in the choice of a 
modeling scheme is the order of computational complexity The following section 
discusses this aspect for the above modeling schemes 

4.8 Computational Complexity Considerations 

In the autocorrelation formulation of the LPC, as discussed in Chapter i, 
the data outside a block of length Nj^ is forced to zero for modeling the data 
within the block This leads to an error sequence that has large magnitude 
towards the ends of the block For this reason, the covar<ance method was used 
for the comparisons of prediction error with the global approximation technique 
However, the autocorrelation formulation leads to a Toeplitz matrix structure 
that can be more efficiently solved than the matrix structure obtained using the 
covariance method In the following, the computaitonal complexity of the 
autocorrelation and covariance methods of the LPC (see C79,8i3 for discussions) 
are compared with that of the global approximation technique Some discussion on 
the CON > local approximation technique is also given 
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% improvement in 


SNo 

Pho- 

neme 

Embedd 

Oimen 

NMINNIW 

Total no 
of nbds 

Blocksiz 

pred error over 
global approx 

1 

/u/ 

3 

50 

10 

600 

20 6 

2 

/u/ 

3 

100 

9 

600 

16 4 

3 

/u/ 

3 

50 

5 

1000 

16 3 

4 

/u/ 

3 

100 

17 

1000 

15 6 

5 

/U/ 

3 

50 

11 

600 

18 0 

6 

A)/ 

2 

100 

5 

600 

28 2 

7 

/U/ 

3 

50 

18 

1000 

27 5 

8 

A)/ 

3 

100 

9 

1000 

44 3 

9 

/o/ 

3 

50 

11 

600 

47 9 

10 

/o/ 

3 

100 

5 

600 

14 9 

11 

/o/ 

3 

50 

17 

1000 

65 3 

12 

/o/ 

3 

100 

10 

1000 

312 

13 

/ q / 

3 

50 

11 

600 

92 3 

14 

/Cl/ 

3 

100 

5 

600 

103 0 

15 

A*./ 

3 

50 

19 

1000 

113 0 

16 

/V 

3 

100 

9 

1000 

86 7 

17 

/e/ 

3 

50 

12 

600 

18 3 

18 

/Q/ 

3 

100 

6 

600 

32 

19 

/e/ 

3 

50 

16 

1000 

519 

20 

/e/ 

3 

100 

10 

1000 

22 0 

21 

/S/ 

3 

50 

11 

600 

-10 5 

22 

/S/ 

3 

100 

5 

600 

16 7 

23 

/«/ 

3 

50 

14 

1000 

57 

24 

A/ 

3 

100 

9 

1000 

98 

25 

/z/ 

5 

100 

6 

600 

20 7 

26 

/z/ 

5 

100 

8 

1000 

- 17 

Table A 5 Representative examples of percentage decrease in the prediction 


error using the CON-local approximation technique over the global approximation 
technique for different phonemes 
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discrete— time 


Figure 4 5 Comparison of error sequences using global approximation technique 
(dashed line) and CON-loca! approximation technique for phoneme M/ Blocklength 
= 600, MRAD = 12, NMINNEW=50 Prediction errors are 2 73 x 10''^ and 1.42 x 
10"^ respectively, le, an improvement of 92 3 %. Only part of the error 
sequence is shown 




Figure 4.6 : Comparison of error sequences using global approximation technique 
(dashed line) and CON-local approximation technique for phoneme /o/ Blocklength 
= 600, MR AD = 12, NMINNEW = 50 Prediction errors are 8 09 x 10'"^ and 5.47 x 
10’ respectively i e , an improvement of 47 9 %. Only part of the error 
sequence is shown 


i 
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I Autocorrelation Method of the LPC mv/oives the 

multiplication of a block of data of length with a proper window function This 
requires multipliotions To obtain the p model parameters requires the 
solution of the following set of p simulatneous equations 

au ^ ^ ^ ^ = R ( 1 ), 1 i 1 i P (4 8 1) 

k-1 

where 


N-l-k 

R ( k ) = S X <m) X ( m+k ) 
m=0 


(4 8 2) 


To obtain the coefficients of the correlation matrix, 0( N,p ) multiplications are 

required Taking advantage of the Toeplitz structure of the matrix, one may use 

2 

Durbin's method to solve equation (4 8 i) This requires another 0( p ) 
multiplications 


2 Covariance Method of the LPC length N 2 to 

model in terms of p coefficients requires the solution of 
P 

bi, ip ( i-k) = ip (1), i < 1 < p (4 8 3) 

k»l '' - - 

where 

N-k-1 

V> ( i-k ) = ^ X (m) X (m+k-i), i^i^p, O^k^p (4 8 4) 

m=-k 

Again, to obtain the matrix coefficients requires □ (N 2 P) multiplications The 

solution of the p simultaneous equations (4 8 3) using Cholesky decomposition 
3 

requires 0 (p ) multiplications 


3 State-space Modeling using Global Approximation Technique - 

involves the reconstruction of the trajectory in a k-dimensional state-space This 
does not involve any multiplication process We have studied the special case of 
second degree polynomial representations The comparison with covariance -method 
was in terms of same model order p for same blocklengths N 2 Using the least 
squares criteria led to the requirement of solving p simultaneous equations 
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(4 4 7) - (4 4 9) where, 

p«l + k + kx( k+1) / 2. f4 8 5) 

The next step is to find the entries of the coefficient matrix This requires 

^ 2 

t 3k^ + lOk*^ + 19k^ + i2k J multiplications which is 0 (N^p ) 

4 

multiplications, where N 3 = N 2 - k, and p is given by equation (4 8 5) Finally, to 

3 

solve the p simultaneous equations requires O (p ) multiplications 

4 State-Space Modeling using CON - Local Approximation Technique ‘•'he exact 

computational complexity in terms of the blocklength, parametersize etc cannot be 
given because of the dynamic nature of the algorithm Therefore, an idea of the 
computational complexity of the various stages of the algorithm is given 

The first step of reconstructing the trajectory in a k-dimensional state- 
space does not involve any multiplication Thus, from a block of length N 2 , a 
vector time senes of length Ng = N 2 - k is created The most important step in 
this modeling scheme is to break the trajectory into local neighbourhoods The 
size of the local neighbourhoods depends upon the radius MRAD and the number of 
new points in the neighbourhood NMINNEW One solves equations (4 4 7)-(4 4 9) for 
each neighbourhood Hoewever, to obtain the coefficient matrices for all 
neighbourhoods requires 0 (Ngpn multiplications as in the global approximation 
case F inally, to solve the p simultaneous equations requires O <p^) 
multiplications 

The extra step involved in the CON - local approximation technique is to 
find m nearest neighbours to arrive at the neighbourhood Using brute force, this 
requires 0 (Ng) steps But using the efficient k-d algorithm [61,621, reduces this 
to 0 <log Ng) steps T o obtain the m nearest neighbours requires 0 (m2 k) 
multiplications The value of m cannot be fixed apriori but it depends dynamically 
on MRAD and NMINNEW and the specific block of data 
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Type of 
Computation 

Autocorrelation 

Method 

Covariance 

Method 

Global - Approximation 
Method 

Windowing 

N 

0 

0 

Matrix Coefficients 

CKN^p) 

0(N2P) 

OCNgP) 

Matrix Solution 

D( p^) 

0( p^) 

0( pn 


Tabis ^ 6 Computations! considerations for the autocorrelation and covariance 
formulations of the LPC and the global approximation method of the 
deterministic state-space modeling N 2 denote the blocklength snd p 
the parametersize 


It IS thus seen that as the prediction properties improve with succeeding 
models; the computational complexity increases. Table 4 6 summarizes the 
computational effort required in terms of multiplications at various stages of the 
autocorrelation; covariance and the global approximation methods 



CHAPTER S 


CONCLUSION 


A comprehensive theory has been developing for the past one and a half 
decades to describe a hitherto unexplained and yet an integral phenomenon of 
dynamical systems called chaos As emphasised at many places in the thesis, it is 
now known that apparently random behaviour can be due to the time evolution of 
simple non-linear systems Most of the initial efforts in chaos had been directed 
towards finding chaotic behaviour in specific systems This approach to analyse 
systems of differential equations governing the behaviour was based on first 
principles Quantities like Lyapunov exponents etc which describe chaotic 
behaviour, were computed analytically to check for chaos However, attention in 
the past two or three years has also been given to the modeling and prediction of 
chaotic behaviour Armed with the various tools developed to analyse chaotic 
behaviour eg dimensions and entropies, one can use them on observed scalar time 
series The attractor dimension gives an idea of the number of independent 
variables governing the system, while the entropy tells us whether the system is 
in the chaotic regime Even apparently random time senes may have small 
dimensional attractors implying that it is the outcome of a deterministic system 
having few degrees of freedom Thus, viewing complexity as arising out of low 
dimensional chaos which is deterministic gives a new tool for analysing 
apparently random behaviour which is different from the theory of random 
processes One can use Takens' theorems to reconstruct the state-space from a 
single observed variable Thereafter, using simple deterministic non-linear 
modeling schemes, one can model observed behaviour 
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The thesis was initiated with an attempt to search for chaotic behaviour in 
speech signals This was, because many of the utterances eg fricatives, which 
are produced by a turbulent air flow through a narrow constriction, are complex 
to analyse and model using traditional techniques Other utterances eg vowels, 
are thought to be regular behaviour because of prominent frequencies called 
formants in their spectrum However, synthesised vowels based on the extracted 
formants sound artificial This led us to believe that although the formants are 
prominent, ev/en in vowels the other frequenies cannot be ignored in any 
reconstruction process that attempts to retain the naturalness of sound 

As pointed out in Chapter 3, a detailed model of the vocal tract should 
consider effects of time van ation of the vocal tract shape, losses due to heat 
conduction and viscous friction at the vocal tract walls, softness of the vocal 
tract walls, radiation of sound at the lips, nasal coupling etc However, such a 
model has not yet been developed Consequently, our analysis of dimensions and 
entropy was based on phoneme utterances from the vocal tract instead of an 
analysis of any of the existing models from first principles Our analysis showed 
that phoneme time series are generated by low dimensional attractors Moreover, 
the analysis of the second-order entropy showed that most of the phoneme time 
series are chaotic in nature 

The observation that the underlying attractors of phoneme time series are 
low-dimensional, allowed us to use the developing ideas of deterministic state- 
space modeling in this framework The results of the comparison of the global 
approximation technique using quadratic polynomials with the LPC were given in 
Chapter 4 For the same model order, the prediction error using global 
approximation technique was invariably lower than that of the covariance method 


of the LPC 
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The local approximation technique cannot be used directly for speech 
signal modeling in the context of storage and transmission because of the 
requirement of a prohibitively large model order Therefore, a compromised 
overlapping neighbourhood (CON) local approximation technique was proposed and 
its prediction properties compared with the global approximation technique It was 
observed that in 80 '/, of the cases studied, the CON - local approximation 
technique had lower prediction error than the global approximation technique 

Success of a particular modeling scheme depends on the choice of the 
approximating function As a first step, quadratic polynomials were studied The 
global approximation technique is convenient to implement although its 
performance depends largely on the choice of the approximating functional 
representation This dependence can be intuitively seen to be reduced in the case 
of the CON-local approximation technique because modeling is limited to local 
areas However, a greater computational effort is required in this case Hence, 
the inherent problem of trade-off between performance and computational 
complexity remains The choice of a particular scheme depends upon the 
requirements and facilities available 

Most of the work done in signal modeling in this thesis has been in the 
nature of gathering evidence using numerical procedures and computer simulation 
An analytical framework can only develop when the underlying theory stabilizes 

Some suggestions for further work in this area are outlined below 

i Our work in the direction of ascertaining the chaotic nature of phoneme time 
series was limited to finding the second-order entropy A more fundamental way 
of ascertaining chaotic behaviour is to find the Lyapunov exponents for specific 
configurations of the system Finding all the Lyapunov exponents from a single 
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variable of observed data is a difficult task Nonetheless, some algorithms have 
been proposed recently £50,53,553 that can be used to get the largest Lyapunov 
exponents with a fair degree of confidence If the largest Lyapunov exponent is 

positive, it IS a sufficient condition to establish the chaotic nature 

2 Effects of quantizing the error sequence on the prediction properties can be 
studied and compared with the quantizers used with LPC models Since the state- 
space modeling schemes using global approximation and the CON-local 
approximation techniques can be used in ADPCM schemes for speech data 
transmission, a complete study of speech data compressicn aspects of suggested 
non-linear models of prediction using quantizers would be desirable Given the 
better prediction properties of the global and the CON-local approximation 
techniques, quantized error sequences are expected to give better reconstruction 
than the LPC which are currently the most popular models in ADPCM schemes 
Alternatively, lower number of bits may be required to transmit the error 
sequence using the new schemes to achieve the same level of reconstruction as 

with LPC models 

3 The ability to model non-linear dynamics leads to a method for reducing 
external noise in the observations using non-linear averaging techniques £673 
This scheme has been studied on data obtained from the evolution of differential 
equations State-space modeling techniques may be used to store speech data for 
future usage In such cases, non-linear averaging techniques may be used to 
improve the quality of sound Alternatively, speech signals generated in a noisy 
environment may be modeled using the above schemes and then non-linear 
averaging techniques may be applied to improve the quality of the reconstructed 
signal Given these uses, performance of non-linear averaging techniques applied 


to noisy speech signals can be studied 
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4 The modeling scheme studied in the thesis was limited to the polynomial class 
of functonal representations As a next step; the class of rational functions may 

I 

be studied Apart from the requirement of solving linear equations to get the 
parameters/ they have the advantage of extrapolating better than polynomials 

Other classes eg neural network representations may also be studied 

5 All the classic problems of speech processing can be studied in the framework 
of reconstructed state-space technique One can possibly do away with the 

artificial tool of time-warping employed in the problem of speech recognition 

6 A lot of work is being done in applying Hidden Markov Models (HMM) to speech 
modeling The correspondence between the reconstructed state-space models and 
the hidden states of an HMM may also be studied in the above context of the 
nature of speech signals 

Many complicated observed behaviour of systems in electrical engineering 
which were earlier explained by hand-waving arguments are now being analysed in 
the light of chaos Examples include non-linear circuits, phase-locked loops etc 
Another field that has opened up is the analysis and modeling of complex signals 
in the paradigm of chaos While we have studied some of these signal modeling 
aspects for the case of speech signals, they can also be used for other classes 
of signals eg biomedical signals, siesmic signals etc , some of which are now 
known to be chaotic These ideas can also be used to model image data for data 


compression applications 
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